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CHAPTER 1 


Section 1.1 


1. 

a. Los Angeles Times, Oberlin Tribune, Gainesville Sun, Washington Post 

b. Duke Energy, Clorox, Seagate, Neiman Marcus 

ce. Vince Correa, Catherine Miller, Michael Cutler, Ken Lee 

d. 2.97, 3.56, 2.20, 2.97 

3 

a. How likely is it that more than half of the sampled computers will need or have needed 
warranty service? What is the expected number among the 100 that need warranty 
service? How likely is it that the number needing warranty service will exceed the 
expected number by more than 10? 

b. Suppose that 15 of the 100 sampled needed warranty service. How confident can we be 
that the proportion of a// such computers needing warranty service is between .08 and 
.22? Does the sample provide compelling evidence for concluding that more than 10% of 
all such computers need warranty service? 

5. 

a. No. All students taking a large statistics course who participate in an SI program of this 
sort. 

b. The advantage to randomly allocating students to the two groups is that the two groups 
should then be fairly comparable before the study. If the two groups perform differently 
in the class, we might attribute this to the treatments (SI and control). If it were left to 
students to choose, stronger or more dedicated students might gravitate toward SI, 
confounding the results. 

c. Ifall students were put in the treatment group, there would be no firm basis for assessing 
the effectiveness of SI (nothing to which the SI scores could reasonably be compared). 

vf One could generate a simple random sample of all single-family homes in the city, or a 


stratified random sample by taking a simple random sample from each of the 10 district 
neighborhoods. From each of the selected homes, values of all desired variables would be 
determined. This would be an enumerative study because there exists a finite, identifiable 
population of objects from which to sample. 


l 


© 2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 


Chapter 1: Overview and Descriptive Statistics 


9. 

a. There could be several explanations for the variability of the measurements. Among 
them could be measurement error (due to mechanical or technical changes across 
measurements), recording error, differences in weather conditions at time of 
measurements, etc. 

b. No, because there is no sampling frame. 

Section 1.2 
11. 
3L }1 
3H |56678 
4L |000112222234 
4H |5667888 stem: tenths 
SL |144 leaf : hundredths 
5H |58 
6L |2 
6H |6678 
7L 
7H |5 

The stem-and-leaf display shows that .45 is a good representative value for the data. In 

addition, the display is not symmetric and appears to be positively skewed. The range of the 

data is .75 — .31 = .44, which is comparable to the typical value of .45. This constitutes a 

reasonably large amount of variation in the data. The data value .75 is a possible outlier. 

13. 
a. 
12 | 2 stem: tens 
12 | 445 leaf: ones 
12 | 6667777 
12 | 889999 
13 | OOOII1I1111 
13 | 2222222222333333333333333 


13 | 44444444444444444455555555555555555555 
13 | 6666666666667777777777 
13 | 888888888888999999 


| 14 | OOOO00L111 
14 | 2333333 
14 | 444 
14 | 77 


The observations are highly concentrated at around 134 or 135, where the display 
suggests the typical value falls. 
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b. 


The histogram of ultimate strengths is symmetric and unimodal, with the point of 
symmetry at approximately 135 ksi. There is a moderate amount of variation, and there 
are no gaps or outliers in the distribution. 


15. 


American movie times are unimodal strongly positively skewed, while French movie times 
appear to be bimodal. A typical American movie runs about 95 minutes, while French movies 
are typically either around 95 minutes or around 125 minutes. American movies are generally 
shorter than French movies and are less variable in length, Finally, both American and French 
movies occasionally run very long (outliers at 162 minutes and 158 minutes, respectively, in 
the samples). 
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17. The sample size for this data set ism = 7 + 20+ 26+...+3+2= 108, 
a. “At most five bidders” means 2, 3, 4, or 5 bidders. The proportion of contracts that 
involved at most 5 bidders is (7 + 20 + 26 + 16)/108 = 69/108 = .639, 
Similarly, the proportion of contracts that involved at least 5 bidders (5 through 11) is 
equal to (16 + 11+9+6+8+3 + 2)/108 = 55/108 =.509. 


ee ee 


[= 


b. The number of contracts with between 5 and 10 bidders, inclusive, is 16+11+9+6+8 
+3 = 53, so the proportion is 53/108 = .491. “Strictly” between 5 and 10 means 6, 7, 8, or 
9 bidders, for a proportion equal to (11 + 9 + 6 + 8)/108 = 34/108 = 315. 


SS 


SPS Seb Ft ha aed ee) 


¢. The distribution of number of bidders is positively skewed, ranging from 2 to 11 bidders, 
with a typical value of around 4-5 bidders. 


19, 

a. From this frequency distribution, the proportion of wafers that contained at least one 
particle is (100-1)/100 = .99, or 99%. Note that it is much easier to subtract 1 (which is 
the number of wafers that contain 0 particles) from 100 than it would be to add all the 
frequencies for 1, 2, 3,... particles. In a similar fashion, the proportion containing at least 
5 particles is (100 - 1-2-3-12-11)/100 = 71/100 =.71, or, 71%. 


b. The proportion containing between 5 and 10 particles is (15+18+10+12+4+5)/100 = 
64/100 = .64, or 64%. The proportion that contain strictly between 5 and 10 (meaning 
strictly more than 5 and strictly Jess than 10) is (18+10+12+4)/100 = 44/100 = .44, or 
44%, 


¢. The following histogram was constructed using Minitab. The histogram is almost 
symmetric and unimodal; however, the distribution has a few smaller modes and has a 
very slight positive skew. 


th Pal Fae Lyte f 
Jay Be Se Sie 9 fo u 2 8 4 
Number of contaminating particies 
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Chapter 1: Overview and Descriptive Statistics 


a. A histogram of the y data appears below. From this histogram, the number of 
subdivisions having no cul-de-sacs (i.e., y = 0) is 17/47 = .362, or 36.2%. The proportion 
having at least one cul-de-sac (y > 1) is (47 — 17)/47 = 30/47 = .638, or 63.8%. Note that 
subtracting the number of cul-de-sacs with y = 0 from the total, 47, is an easy way to find 
the number of subdivisions with y 2 1. 


b. A histogram of the z data appears below. From this histogram, the number of 
subdivisions with at most 5 intersections (i.e., z < 5) is 42/47 = .894, or 89.4%. The 
proportion having fewer than 5 intersections (i.e., z < 5) is 39/47 = .830, or 83.0%. 


5 
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Chapter 1: Overview and Descriptive Statistics 


23. Note: since the class intervals have unequal length, we must use a density scale. 


FN: Dbrsavay & 53 


20 
Tantrum duration 


The distribution of tantrum durations is unimodal and heavily positively skewed. Most 
tantrums last between 0 and || minutes, but a few last more than half an hour! With such 
heavy skewness, it’s difficult to give a representative value. 


AE ae? 


ix 


ao h 


‘= 


25. The transformation creates a much more symmetric, mound-shaped histogram. 


RTT 


Histogram of original data: 


| 6 
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Histogram of transformed data: 


a. The endpoints of the class intervals overlap. For example, the value 50 falls in both of 
the intervals 0-50 and 50—100. 


b. The lifetime distribution is positively skewed. A representative value is around 100. 


There is a great deal of variability in lifetimes and several possible candidates for 
outliers. 


Class Interval Frequency Relative Frequency 


0-< 50 9 0.18 
50-<100 19 0.38 
100-<150 ll 0.22 
150-<200 4 0.08 
200-<250 2 0.04 
250-<300 2 0.04 
300-<350 I 0.02 
350-<400 ! 0.02 
400-<450 0 0.00 
450-<500 0 0.00 
500-<550 I 0.02 

50 1.00 


7 
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¢. There is much more symmetry in the distribution of the transformed values than in the 
values themselves, and less variability. There are no longer gaps or obvious outliers. 


Class Interval uenc Relative Frequen 
2.25-<2.75 BA 0.04 
2.75—<3,25 2 0.04 
3.25—<3.75 3 0.06 
3.75—<4.25 8 0.16 
4.25-<4.75 18 0.36 
4.75-<5.25 10 0.20 
5.25-<5.75 4 0.08 
5.75-<6.25 3 0.06 


d. The proportion of lifetime observations in this sample that are less than 100 is .18 + .38 = 
.56, and the proportion that is at least 200 is .04 + .04 + .02 + .02 + .02 =.14. 


8 
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29. 
Physical Frequency Relative 
Activity Frequency 
A 28 28 
B 19 19 
Cc 18 18 
D 17 17 
E 9 09 
F 9 09 
100 1.00 
Type of Physical Activity 
31. 

Class Frequen Cum. Freq. Cum. Rel. i 
0.0-<4.0 2 2 0.050 
4.0-<8.0 14 16 0,400 
8.0-<12.0 11 27 0.675 

12.0-<16.0 8 35 0.875 

16.0—<20.0 4 39 0.975 

20.0-<24.0 0 39 0.975 

24.0-—<28.0 1 40 1,000 
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Section 1.3 


33. 
a. Using software, ¥ = 640.5 ($640,500) and ¥= 582.5 ($582,500). The average sale price 
for a home in this sample was $640,500. Half the sales were for less than $582,500, while 
half were for more than $582,500. 


b. Changing that one value lowers the sample mean to 610.5 ($610,500) but has no effect on 
the sample median. 


c. After removing the two largest and two smallest values, X,,..5, = 591.2 ($591,200). 


d. A 10% trimmed mean from removing just the highest and lowest values is ¥,;,.,= 596.3. 


To form a 15% trimmed mean, take the average of the 10% and 20% trimmed means to 
get X15, = (591.2 + 596.3/2 = 593.75 ($593,750). 


35. The sample size is n = 15. 
a. The sample mean is ¥ = 18.55/15 = 1.237 pg/g and the sample median is x= the o 
ordered value = .56 g/g. These values are very different due to the heavy positive 
skewness in the data. 


b. A 1/15 trimmed mean is obtained by removing the largest and smallest values and 
averaging the remaining 13 numbers: (.22 + ... + 3.07)/13 = 1.162. Similarly, a 2/15 
trimmed mean is the average of the middle 11 values: (.25 + ... + 2.25)/11 = 1.074. Since 
the average of 1/15 and 2/15 is .1 (10%), a 10% trimmed mean is given by the midpoint 
of these two trimmed means: (1.162 + 1.074)/2 = 1.118 g/g. 


c. The median of the data set will remain .56 so long as that’s the 8" ordered observation. 
Hence, the value .20 could be increased to as high as .56 without changing the fact that 
the 8" ordered observation is .56. Equivalently, .20 could be increased by as much as .36 
without affecting the value of the sample median. 


37. ¥ =12.01, ¥ =11.35, ¥,,q9) =11.46. The median or the trimmed mean would be better 
choices than the mean because of the outlier 21.9. 


39. 


16.475 _ (1.007 +1.011) _ 


1.009 


a. <x, =16.475 so x= = 1.0297 ;x 


b. 1.394 can be decreased until it reaches 1.011 (i.e. by 1.394 — 1.011 = 0.383), the largest 
of the 2 middle values. If it is decreased by more than 0.383, the median will change. 


10 
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41. 
a. x/n=7/10=.7 
b. x =.70= the sample proportion of successes 
¢. To have x/n equal .80 requires x/25 = .80 or x = (.80)(25) = 20. There are 7 successes (S) 
already, so another 20 — 7 = 13 would be required. 
43. The median and certain trimmed means can be calculated, while the mean cannot — the exact 


CrP) — 68.0, 


values of the “100+” observations are required to calculate the mean. <= 


F120) = 66.2, Fey = 67.5. 


Section 1.4 


45. 
a. X=115.58. The deviations from the mean are 116.4 — 115.58 = .82, 115.9— 115.58 = 
32, 114.6 -115.58 =~—.98, 115.2 — 115.58 =—.38, and 115.8 — 115.58 =.22. Notice that 
the deviations from the mean sum to zero, as they should. 


b. 5s? =[(.82)? + (.32)' + (-.98)" + (-.38)° + (.22)"(5 — 1) = 1.928/4 = .482, so s = .694. 


c. Lx) = 66795.61, 80 s* = S/(n— 1) = (Ex? -(2x,)? /n)/(n-1) = 


(66795.61 -(577.9) /5)/4 = 1.928/4 = .482. 

d. The new sample values are: 16.4 15.9 14.6 15.2 15.8. While the new mean is 15.58, 
all the deviations are the same as in part (a), and the variance of the transformed data is 
identical to that of part (b). 


47. 
a. From software, ¥ = 14.7% and x = 14.88%. The sample average alcohol content of 


these 10 wines was 14.88%. Half the wines have alcohol content below 14.7% and half 
are above 14.7% alcohol. 


b. Working long-hand, £(x, — xy =(148- 14.88) + ... + (15.0 — 14.88) = 7.536. The 
sample variance equals = x(x, ~¥) =7.536/(10 — 1) = 0.837. 


c. Subtracting 13 from each value will not affect the variance. The 10 new observations are 
1.8, 1.5, 3.1, 1.2, 2.9, 0.7, 3.2, 1.6, 0.8, and 2.0. The sum and sum of squares of these 10 


new numbers are Ly, = 18.8 and Ly? = 42.88. Using the sample variance shortcut, we 
obtain s* = [42.88 - (18.8)7/10]/(10 — 1) = 7.536/9 = 0.837 again. 


49, 
a. Lx, =2.75++--+3.01=56.80, Ex? =2.75' +---+3.01? = 197.8040 


b. s6 =———— _ = —— _ = .5016, 5 =.708 
16 
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51. 

a. From software, s* = 1264.77 min’ and s = 35.56 min. Working by hand, Lx = 2563 and 

Ex? = 368501, so 
x 2 
Pm coer =1264.766 and s = V1264.766 = 35.564 

b. If =time in hours, then y= cx where c= 2. So, s? =c*s? =(4) 1264.766=.351 hr” 

and s, =cs, =(2;)35.564 =.593 hr. 
53. 

a. Using software, for the sample of balanced funds we have x =1.121,x =1.050,s = 0.536 ; 
for the sample of growth funds we have X¥ =1.244,%=1.100,s = 0.448. 

b. The distribution of expense ratios for this sample of balanced funds is fairly symmetric, 
while the distribution for growth funds is positively skewed. These balanced and growth 
mutual funds have similar median expense ratios (1.05% and 1.10%, respectively), but 
expense ratios are generally higher for growth funds. The lone exception is a balanced 
fund with a 2.86% expense ratio. (There is also one unusually low expense ratio in the 
sample of balanced funds, at 0.09%.) 

g 

: 

3 

& 
55. 


a. Lower half of the data set: 325 325 334 339 356 356 359 359 363 364 364 
366 369, whose median, and therefore the lower fourth, is 359 (the 7" observation in the 
sorted list). 


Upper half of the data set: 370 373 373 374 375 389 392 393 394 397 402 
403 424, whose median, and therefore the upper fourth is 392. 


So, f, = 392 — 359 = 33. 


12 
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b. inner fences: 359 — 1.5(33) = 309.5, 392 + 1.5(33) = 441.5 
To be a mild outlier, an observation must be below 309.5 or above 441.5. There are none 
in this data set. Clearly, then, there are also no extreme outliers, 


ec. A boxplot of this data appears below. The distribution of escape times is roughly 
symmetric with no outliers. Notice the box plot “hides” the fact that the distribution 
contains two gaps, which can be seen in the stem-and-leaf display. 


Escape time (sec) 


d. Not until the value x = 424 is lowered below the upper fourth value of 392 would there be 
any change in the value of the upper fourth (and, thus, of the fourth spread). That is, the 
value x = 424 could not be decreased by more than 424 — 392 = 32 seconds. 


57. 
a. f,=216.8—196.0=20.8 
inner fences: 196 — 1.5(20.8) = 164.6, 216.8 + 1.5(20.8) = 248 
outer fences; 196 — 3(20.8) = 133.6, 216.8 + 3(20.8) = 279.2 
Of the observations listed, 125.8 is an extreme low outlier and 250.2 is a mild high 


outlier. 


b. A boxplot of this data appears below. There is a bit of positive skew to the data but, 
except for the two outliers identified in part (a), the variation in the data is relatively 


small. 
_ <TR. 


120 140 160 180 200 220 240 260 
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59. 
a. Ifyou aren’t using software, don’t forget to sort the data first! 


ED: median = .4, lower fourth = (.1 + .1)/2 = .1, upper fourth = (2.7 + 2.8)/2 = 2.75, 
fourth spread = 2.75 — .1 = 2.65 


Non-ED: median = (1.5 + 1.7)/2 = 1.6, lower fourth = .3, upper fourth = 7.9, 
fourth spread = 7.9 — .3 = 7.6. 


b. ED: mild outliers are less than .1 — 1.5(2.65) =—3.875 or greater than 2.75 + 1.5(2.65) = 
6.725. Extreme outliers are less than .1 — 3(2.65) = —7.85 or greater than 2.75 + 3(2.65) = 
10.7. So, the two largest observations (11.7, 21.0) are extreme outliers and the next two 
largest values (8.9, 9.2) are mild outliers. There are no outliers at the lower end of the 
data. 


Non-ED: mild outliers are less than .3 — 1.5(7.6) =—11.1 or greater than 7.9 + 1.5(7.6) = 
19.3. Note that there are no mild outliers in the data, hence there cannot be any extreme 
outliers, either. 


¢. Acomparative boxplot appears below. The outliers in the ED data are clearly visible. 
There is noticeable positive skewness in both samples; the Non-ED sample has more 
variability then the Ed sample; the typical values of the ED sample tend to be smaller 
than those for the Non-ED sample. 


State 


Cocaine concentration (mg/L) 


61. Outliers occur in the 6a.m. data. The distributions at the other times are fairly symmetric. 
Variability and the “typical” gasoline-vapor coefficient values increase somewhat until 2p.m., 
then decrease slightly. 
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Supplementary Exercises 


63. As seen in the histogram below, this noise distribution is bimodal (but close to unimodal) with 
a positive skew and no outliers. The mean noise level is 64.89 dB and the median noise level 
is 64.7 dB. The fourth spread of the noise measurements is about 70.4 — 57.8 = 12.6 dB. 


a. The histogram appears below. A representative value for this data would be around 90 
MPa. The histogram is reasonably symmetric, unimodal, and somewhat bell-shaped with 
a fair amount of variability (s = 3 or 4 MPa). 


b. The proportion of the observations that are at least 85 is 1 — (6+7)/169 = 9231. The 
proportion less than 95 is 1 — (13+3)/169 = .9053. 


15 
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c. 90 is the midpoint of the class 89—<91, which contains 43 observations (a relative 
frequency of 43/169 = .2544). Therefore about half of this frequency, .1272, should be 
added to the relative frequencies for the classes to the left of x = 90. That is, the 
approximate proportion of observations that are less than 90 is .0355 + .0414 + .1006 + 
1775 + .1272 = 4822. 


67. 
a. Aortic root diameters for males have mean 3.64 cm, median 3.70 cm, standard deviation 


0.269 cm, and fourth spread 0.40. The corresponding values for females are ¥ = 3.28 cm, 
{ =3.15 cm, s = 0.478 cm, and f, = 0.50 em. Aortic root diameters are typically (though 
not universally) somewhat smaller for females than for males, and females show more 
variability. The distribution for males is negatively skewed, while the distribution for 
females is positively skewed (see graphs below). 


Boxplot of M, F 


b. For females (n = 10), the 10% trimmed mean is the average of the middle 8 observations: 
Xio) = 3-24 cm. For males (7 = 13), the 1/13 trimmed mean is 40.2/11 = 3.6545, and 


the 2/13 trimmed mean is 32.8/9 = 3.6444. Interpolating, the 10% trimmed mean is 
Xercio) = 0.7(3.6545) + 0.3(3.6444) = 3.65 cm. (10% is three-tenths of the way from 1/13 


to 2/13). 


69. 


Da us ) ee ae 2D ae 
n 


y= 
n 
pl -7)  Y(ax,+b-(ae+b)) SY (ax,-axy’ 
: n—l i n-l J re | 
dle re =a’s° 
n-l 
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x="C,y="F 
y=2(873)+32 =189.14°F 


s,=,/8 = [2 } (1.09 04)° = /3.5044 =1.872°F 


13 0 
a. The mean, median, and trimmed mean are virtually identical, which suggests symmetry. 
If there are outliers, they are balanced. The range of values is only 25.5, but half of the 
values are between 132.95 and 138.25. 


b. See the comments for (a). In addition, using 1.5(Q3 — Q1) as a yardstick, the two largest 
and three smallest observations are mild outliers. 


120 125 130 135 140 145 150 
Strength 
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75. 


Chapter 1: Overview and Descriptive Statistics 


From software, ¥ = .9255,s=.0809; x= .93,f,=.1. The cadence observations are slightly 
skewed (mean = .9255 strides/sec, median = .93 strides/sec) and show a small amount of 
variability (standard deviation = 0809, fourth spread = .1). There are no apparent outliers in 
the data. 


718 stem = tenths 
8}11556 leaf = hundredths 
9|2233335566 
0/0566 
0.30 0.85 0.90 0.95 1.00 1.05 
Cadence (strides per second) 


a. The median is the same (371) in each plot and all three data sets are very symmetric. In 
addition, all three have the same minimum value (350) and same maximum value (392). 
Moreover, all three data sets have the same lower (364) and upper quartiles (378). So, all 
three boxplots will be identical. (Slight differences in the boxplots below are due to the 
way Minitab software interpolates to calculate the quartiles.) 


} 350 360 370 380 390 
Fatigue limit (MPa) 
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b. A comparative dotplot is shown below. These graphs show that there are differences in 
the variability of the three data sets. They also show differences in the way the values are 
distributed in the three data sets, especially big differences in the presence of gaps and 
clusters. 


354 360 366 372 378 384 390 
Fatigue limit (MPa) 


il a A the Pc i Mer A hl An alien allele a eel Bei ae eh 


c. The boxplot in (a) is not capable of detecting the differences among the data sets. The 
primary reason is that boxplots give up some detail in describing data because they use 
only five summary numbers for comparing data sets. 


Th. 
a. 

0 444444444577888999 leaf = 1.0 
1 00011111111124455669999 stem = 0.1 
2 1234457 
3 11355 
4 17 
5 3 
6 
7 67 
8 1 
HI 10.44, 13.41 
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b. Since the intervals have unequal width, you must use a density scale. 


Density 


Pitting depth (mm) 


c. Representative depths are quite similar for the three types of soils — between 1.5 and 2. 
Data from the C and CL soils shows much more variability than for the other two types. 
The boxplots for the first three types show substantial positive skewness both in the 
middle 50% and overall. The boxplot for the SYCL soil shows negative skewness in the 
middle 50% and mild positive skewness overall. Finally, there are multiple outliers for 
the first three types of soils, including extreme outliers. 


79. 
nel n n+l — 
HF wee £2 She wal | NE, + Xs 
a. x, = X, +X.) = NX, +X,41250 Rat a ee ees 
i=l jel n+145 n+l 


b. _ In the second line below, we artificially add and subtract nx. to create the term needed 
for the sample variance: 


nei 


n+} 
nS. = 20 -x,,,)° 2)" * (n + 1)x.., 


i=l i=l 
n 2 n 
2 <2 2 —2 —2 2 <2 
=> x +X,,,—(n+)x,,, = Dy —nx, |+nx. +x,,,-(n+ DX, 
i=l fel 
.2 2 re <2 
=(n—-ls, +{x.,, +mnx, —(n+1)x2.,} 


Substitute the expression for ¥,,, from part (a) into the expression in braces, and it 


simplifies to ak x,,,-*,)', as desired. 


e Fin, = 15(12.58) +11.8 = ~~ 


6 = 12.53. Then, solving (b) for s?,, gives 

2 n-| 2 | ; — \2 14 2 ] 2 - 

2 =—s? +—(x,,, -¥, ) = (512) + (11.8 - 12.58 = .238. Finally, th 
toe nae X,) Th ) 16 ) inally, the 


s 


standard deviation is s,, = V¥.238 =.532. 
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Chapter 1: Overview and Descriptive Statistics 


81. Assuming that the histogram is unimodal, then there is evidence of positive skewness in the 
data since the median lies to the left of the mean (for a symmetric distribution, the mean and 
median would coincide). 


For more evidence of skewness, compare the distances of the 5" and 95" percentiles from the 
median: median — 5" %ile = 500 — 400 = 100, while 95" %ile — median = 720 — 500 = 220. 
Thus, the largest 5% of the values (above the 95th percentile) are further from the median 
than are the lowest 5%. The same skewness is evident when comparing the 10" and 90" 
percentiles to the median, or comparing the maximum and minimum to the median. 


a. When there is perfect symmetry, the smallest observation y, and the largest observation 
y» Will be equidistant from the median, so y,-*=X—y,. Similarly, the second-smallest 
and second-largest will be equidistant from the median, so y,_, —X = X—y,, and so on. 


Thus, the first and second numbers in each pair will be equal, so that each point in the 
plot will fall exactly on the 45° line. 


When the data is positively skewed, y, will be much further from the median than is y;, 
so y, —¥ will considerably exceed X¥—y, and the point (y, — xX,%—y,) will fall 
considerably below the 45° line, as will the other points in the plot. 


b. The median of these n = 26 observations is 221.6 (the midpoint of the 13" and 14" 
ordered values). The first point in the plot is (2745.6 — 221.6, 221.6 — 4.1) = (2524.0, 
217.5). The others are: (1476.2, 213.9), (1434.4, 204.1), (756.4, 190.2), (481.8, 188.9), 
(267.5, 181.0), (208.4, 129.2), (112.5, 106.3), (81.2, 103.3), (53.1, 102.6), (53.1, 92.0), 
(33.4, 23.0), and (20.9, 20.9). The first number in each of the first seven pairs greatly 
exceeds the second number, so each of those points falls well below the 45° line. A 
substantial positive skew (stretched upper tail) is indicated. 
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Section 2.1 


S= {1324, 1342, 1423, 1432, 2314, 2341. 2413, 2431, 3124, 3142, 4123, 4132, 3214, 3241, 4213, 
4231}. 


Event A contains the outcomes where | is first in the list: 
A = {1324, 1342, 1423, 1432}. 


Event B contains the outcomes where 2 is first or second: 
B= {2314, 2341, 2413, 2431, 3214, 3241, 4213, 4231}. 


The event AUB contains the outcomes in 4 or B or both: 

AUB = {1324, 1342, 1423, 1432, 2314, 2341, 2413, 2431, 3214, 3241, 4213, 4231}. 
ACB = @, since 1 and 2 can’t both get into the championship game. 

Al = £-A= {2314, 2341, 2413, 2431, 3124, 3142, 4123, 4132, 3214, 3241, 4213, 4231}. 


A = {SSF, SFS, FSS}. 
B = {SSS, SSF, SFS, FSS}. 


For event C to occur, the system must have component | working (S in the first position), then at least 
one of the other two components must work (at least one S in the second and third positions): C= 
{ SSS, SSF, SFS}. 


C’ = {SFF, FSS, FSF, FFS, FFF}. 

AUC = {SSS, SSF, SFS, FSS}. 

ANC = {SSF, SFS}. 

BUC = {SSS, SSF, SFS, FSS}. Notice that B contains C, so BUC = B. 
BOC = {SSS SSF, SFS}. Since B contains C, BAC =C. 
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a. The 3° = 27 possible outcomes are numbered below for later reference. 


Outcome 


b. Outcome numbers 1, 14, 27 above. 
c. Outcome numbers 6, 8, 12, 16, 20, 22 above. 


d. Outcome numbers 1, 3, 7, 9, 19, 21, 25,27 above. 


a. = {BBBAAAA, BBABAAA, BBAABAA, BBAAABA, BBAAAAB, BABBAAA, BABABAA, BABAABA, 
BABAAAB, BAABBAA, BAABABA, BAABAAB, BAAABBA, BAAABAB, BAAAABB, ABBBAAA, 
ABBABAA, ABBAABA, ABBAAAB, ABABBAA, ABABABA, ABABAAB, ABAABBA, ABAABAB, 
ABAAABB, AABBBAA, AABBABA, AABBAAB, AABABBA, AABABAB, AABAABB, AAABBBA, 
AAABBAB, AAABABB, AAAABBB}. 


b. AAAABBB, AAABABB, AAABBAB, AABAABB, AABABAB. 
a. In the diagram on the left, the shaded area is (4UB)'. On the right, the shaded area is 4’, the striped 


area is B’, and the intersection A’>B’ occurs where there is both shading and stripes. These two 
diacrams disnlav the same area 


9 
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b. Inthe diagram below, the shaded area represents (AMB)’. Using the right-hand diagram from (a), the 
union of 4’ and B’ is represented by the areas that have either shading or stripes (or both). Both of the 
diagrams display the same area. 


Section 2.2 


11. 

a. .07. 

b. .15+.10+ .05 =.30. 

c. Let A =the selected individual owns shares in a stock fund. Then P(A) = .18 + 25 = .43. The desired 
probability, that a selected customer does not shares in a stock fund, equals P(A’) = 1 — P(A) =1-.43 
= §7. This could also be calculated by adding the probabilities for all the funds that are not stocks. 

13. 


a. A,UA,= “awarded either #1 or #2 (or both)”: from the addition rule, 
P(A, U A>) = P(A;) + P(Az) — P(A; 0 A2) = 22 + 25 —-.11 =.36. 


b. A’) A, = “awarded neither #1 or #2”: using the hint and part (a), 
P(A’ Al) = P(A, UA’) = 1- P(A, UA,) = 1 = 36 = .64. 


c. A UA,UA,=“awarded at least one of these three projects”: using the addition rule for 3 events, 
P(A, UA,UA)= P(A) + P(A,) + P(A,)- P(A, 0.4,)— P(A, 9.4) — P(A, 943) + PCA A: A A,)= 
22 +.25 + .28—.11 -.05 — .07 + .01 = 53. 


d. A. AL. A,= “awarded none of the three projects”: 
P(A. 0 Al. A,) = 1 - P(awarded at least one) = 1 — 53 = .47. 
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Chapter 2: Probability 


e. A’ A, A,= “awarded #3 but neither #1 nor #2”: from a Venn diagram, 
P(A A; ay fe P(A;) = P(A, 0) A3) — P(A2 A3) + P(A, OA, 1) A3) = 
.28 — .05 — .07 + .01 = .17. The last term addresses the “double counting” of the two subtractions. 


f. (A) AL)U A,= “awarded neither of #1 and #2, or awarded #3”: from a Venn diagram, 
P((A) 0 AS) UA,) = P(none awarded) + P(A3) = .47 (from d) + .28 = 75. 


Alternatively, answers to a-f can be obtained from probabilities on the accompanying Venn diagram: 
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Chapter 2: Probability 


a. Let £ be the event that at most one purchases an electric dryer. Then £’ is the event that at least two 
purchase electric dryers, and P(E’) = 1 — P(E) = 1-428 = Be PO 


b. Let A be the event that all five purchase gas, and let B be the event that all five purchase electric. All 
other possible outcomes are those in which at least one of each type of clothes dryer is purchased. 
Thus, the desired probability is 1 — [P(A) — P(B)] = 
1 —[.116 + .005] = 879. 


a. The probabilities do not add to 1 because there are other software packages besides SPSS and SAS for 
which requests could be made. 

b. P(A’) =1-P(A)=1-.30=.70. 

c. Since A and B are mutually exclusive events, P(A UB) = P(A) + P(B) = 30 + 50 = .80. 

d. By deMorgan’s law, P(A’ 7 B') = P(A V BY’) = 1 - P(A U B)= 1 —.80=.20. 
In this example, deMorgan’s law says the event “neither A nor B” is the complement of the event 
“either A or B.” (That’s true regardless of whether they’re mutually exclusive.) 


724 
10,000 * 


Let A be that the selected joint was found defective by inspector A, so P(A) = Let B be analogous 


for inspector B, so P(B) = Oe] _ The event “at least one of the inspectors judged a joint to be defective is 


AUB, so P(AUB) = pe ; 


a. By deMorgan’s law, P(neither A nor B) = P(4'OB')= 1 — (ALB) = 1-73 = feel = 8841. 


b. The desired event is BOA’. From a Venn diagram, we see that P(BOA') = P(B) — P(ADB). From the 
addition rule, P(AUB) = P(A) + P(B) — P(ANB) gives P(AQB) = .0724 + .0751 — .1 159 = .0316. 
Finally, P(Bo4') = P(B) — P(AMB) = .0751 — .0316 = .0435. 


In what follows, the first letter refers to the auto deductible and the second letter refers to the homeowner's 
deductible. 
a. P(MH)=.10. 


b. P(low auto deductible) = P({LN, LL, LM, LH})=.04 + 06 + .05 + .03 =.18. Following a similar 
pattern, P(low homeowner’s deductible) = .06 + .10 + 03 = .19. 


c. P(same deductible for both) = P({LL, MM, HH}) = 06 + .20+ .15= 41. 
d. P(deductibles are different) = | — P(same deductible for both) = 1 — .41 =.59. 


e. P(at least one low deductible) = P({ LN, LL, LM, LH, ML, HL}) = .04 + .06 + .05 + .03 + .10 + .03 = 
a IS 


f. P(neither deductible is low) = 1 — P(at least one low deductible) = 1 —.31 =.69. 
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23. Assume that the computers are numbered 1-6 as described and that computers | and 2 are the two laptops. 
There are 15 possible outcomes: (1,2) (1,3) (1,4) (1,5) (1,6) (2,3) (2,4) (2,5) (2,6) (3,4) (3,5) (3,6) (4,5) 
(4,6) and (5,6). 


a. P(both are laptops) = P({(1,2)}) = +: =.067. 


b. P(both are desktops) = P({(3,4) (3,5) (3,6) (4,5) (4,6) (5,6)}) = {=.40. 


c. P(at least one desktop) = 1 — P(no desktops) = 1 — P(both are laptops) = 1 — .067 = .933. | 


d. P(at least one of each type) = 1 — P(both are the same) = 1 — [P(both are laptops) +  P(both are 
desktops)] = 1 —[.067 + .40] =.533. 


25. By rearranging the addition rule, P(A ~ B) = P(A) + P(B) — P(AUB) = 40 + 55 — .63 = .32. By the same 
method, P(A  C) = .40 + .70 — .77 = .33 and P(B M C) = .55 + .70 —.80 =.45. Finally, rearranging the 
addition rule for 3 events gives 
P(A NBOC)=P(AUBUC)— P(A) —P(B)-— P(C) + P(A 1 B) + PAN C)+ P(BO CO) =.85 — 40 —.55 
— .70.+ .32 + .33 + .45.= 30. 

These probabilities are reflected in the Venn diagram below. 


a. P(AUBUC)=.85, as given. 
b. P(none selected) = 1 — P(at least one selected) = 1 — P(A UBU C)=1-—.85 =.15. 
¢. From the Venn diagram, P(only automatic transmission selected) = .22. 
d. From the Venn diagram, P(exactly one of the three) = .05 + .08 + .22 = .35. 
27. There are 10 equally likely outcomes: {A, B} {A,Co} {A, Cr} {A,F} {B, Co} {B, Cr} {B, F} {Co, Cr} | 
{Co, F} and {Cr, F}. 
a. P({A,B})=4 =.1. | 


b. P(at least one C) = P({A, Co} or {A, Cr} or {B, Co} or {B, Cr} or {Co, Cr} or {Co, F} or {Cr, F}) = 
= 7. 


10 


¢c. Replacing each person with his/her years of experience, P(at least 15 years) = P({3, 14} or {6, 10} or 
{6, 14} or {7, 10} or {7, 14} or {10, 14}) = $=.6. 
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Section 2.3 


29, 


31. 


33. 


35. 


There are 26 letters, so allowing repeats there are (26)(26) = (26) = 676 possible 2-letter domain 
names. Add in the 10 digits, and there are 36 characters available, so allowing repeats there are 
(36)(36) = (36) = 1296 possible 2-character domain names. 

By the same logic as part a, the answers are (26) = 17,576 and (36) = 46,656. 

Continuing, (26)* = 456,976; (36)" = 1,679,616. 

P(4-character sequence is already owned) = 1 — P(4-character sequence still available) = | — 
97,786/(36)* = .942. 

Use the Fundamental Counting Principle: (9)(5) = 45. 

By the same reasoning, there are (9)(5)(32) = 1440 such sequences, so such a policy could be carried 


out for 1440 successive nights, or almost 4 years, without repeating exactly the same program. 


Since there are 15 players and 9 positions, and order matters in a line-up (catcher, pitcher, shortstop, 
etc. are different positions), the number of possibilities is Po,15 = (15)( 14)...(7) or 15!/(15-9)! = 
1,816,214,440. 


For each of the starting line-ups in part (a), there are 9! possible batting orders. So, multiply the answer 
from (a) by 9! to get (1,816,214,440)(362,880) = 659,067,88 1,472,000. 


Order still matters: There are P;,; = 60 ways to choose three left-handers for the outfield and P6,10 = 


151,200 ways to choose six right-handers for the other positions. The total number of possibilities is = 
(60)(151,200) = 9,072,000. 


10 
There are ( 5 = 252 ways to select 5 workers from the day shift. In other words, of all the ways to 
select 5 workers from among the 24 available, 252 such selections result in 5 day-shift workers. Since 
24 
the grand total number of possible selections is 5 = 42504, the probability of randomly selecting 5 


day-shift workers (and, hence, no swing or graveyard workers) is 252/42504 = .00593. 


ahs 8 6 
Similar to a, there are i = 56 ways to select 5 swing-shift workers and (<) = 6 ways to select 5 


graveyard-shift workers. So, there are 252 + 56 + 6 = 314 ways to pick 5 workers from the same shift. 
The probability of this randomly occurring is 314/42504 = .00739. 


P(at least two shifts represented) = | — P(all from same shift) = 1 — .00739 = .99261. 
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Chapter 2: Probability 


There are several ways to approach this question. For example, let 4; = “day shift is unrepresented,” 
Az = “swing shift is unrepresented,” and A; = “graveyard shift is unrepresented.” Then we want 
P(A, UA, U A3). 


’ 8+6 
N(A,) = N(day shift unrepresented) = (all from swing/graveyard) -( ; = 2002, 


since there are 8 + 6 = 14 total employees in the swing and graveyard shifts. Similarly, 


et] 


10+6 
N(A2) = ( 5 = 4368 and N(A;) = ( = 8568. Next, M(A; © A2) = Mall from graveyard) = 6 


from b. Similarly, M/A, © A3) = 56 and N(A2  A3) = 252. Finally, N(A, 0 Az A A3) = 0, since at least 
one shift must be represented. Now, apply the addition rule for 3 events: 
2002 + 4368 +8568-6-—56-252+0 14624 _ 3441 


PAYS ie 42504 42504 


By the Fundamental Counting Principle, with n,; = 3, n2 = 4, and n; = 5, there are (3)(4)(5) = 60 runs. 
With m, = 1 (just one temperature), mn. = 2, and n; = 5, there are (1)(2)(5) = 10 such runs. 


For each of the 5 specific catalysts, there are (3)(4) = 12 pairings of temperature and pressure. Imagine 
we separate the 60 possible runs into those 5 sets of 12. The number of ways to select exactly one run 


12) 60 
from each of these 5 sets of 12 is ( ; = 12°. Since there are ( ‘ }vax to select the 5 runs overall, 


12)" (60 60 
the desired probability is ( ( i |- 12° ( < |; 0456. 


: ; 5+6+4 15 
39, In a-c, the size of the sample space is N = = = 455. 


3 3 
There are four 23W bulbs available and 5+6 = 11 non-23W bulbs available. The number of ways to 


4\(11 
select exactly two of the former (and, thus, exactly one of the latter) is Pl ' = 6(11) = 66. Hence, 


the probability is 66/455 = .145. 
- {> dN 6 
The number of ways to select three 13W bulbs is 3 = 10. Similarly, there are 3 = 20 ways to 


4 
select three 18W bulbs and (3) = 4 ways to select three 23W bulbs. Put together, there are 10 + 20 +4 


= 34 ways to select three bulbs of the same wattage, and so the probability is 34/455 = .075. 


. (5)\(6\(4 si 
The number of ways to obtain one of each type is rally = (5)(6)(4) = 120, and so the probability 


is 120/455 = .264. 
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d. Rather than consider many different options (choose 1, choose 2, etc.), re-frame the problem this way: 
at least 6 draws are required to get a 23W bulb iffa random sample of five bulbs fails to produce a 
23W bulb. Since there are 11 non-23W bulbs, the chance of getting no 23W bulbs in a sample of size 5 


1 
is (3 | = 462/3003 = .154. 


a. (10)(10)(10)(10) = 10*= 10,000. These are the strings 0000 through 9999. 


—— 


41. 


b. Count the number of prohibited sequences. There are (i) 10 with all digits identical (0000, 1111,..., 
9999): (ii) 14 with sequential digits (0123, 1234, 2345, 3456, 4567, 5678, 6789, and 7890, plus these 
same seven descending); (iii) 100 beginning with 19 (1900 through 1999). That’s a total of 10+ 14+ 
100 = 124 impermissible sequences, so there are a total of 10,000 — 124 = 9876 permissible sequences. 


The chance of randomly selecting one is just > = .9876. 


c. All PINs of the form 8xx1 are legitimate, so there are (10)(10) = 100 such PINs. With someone 
randomly selecting 3 such PINs, the chance of guessing the correct sequence is 3/100 = 03. 


d. Ofall the PINs of the form 1xx1, eleven is prohibited: 1111, and the ten of the form 19x1. That leaves 
89 possibilities, so the chances of correctly guessing the PIN in 3 tries is 3/89 = .0337. 


52 
43. There are ; |- 2,598,960 five-card hands. The number of 10-high straights is (4)(4)(4)(4)(4) = 4° = 1024 


(any of four 6s, any of four 7s, etc.). So, P(10 high straight) = ee = ,000394 . Next, there ten “types 


2,598,960 

1024 
2,598,960 
are only 40 straight flushes: each of the ten sequences above in each of the 4 suits makes (10)(4) = 40. So, 


P(straight flush) = 40 _ _ 99001539. 
2,598,960 


of straight: A2345, 23456, ..., 910JQK, 10JQKA. So, P(straight) = 10x = 00394. Finally, there 


Section 2.4 


45. 
a. P(A)= .106+.141 + .200 = 447, P(C) =.215 + .200 + .065 + .020 = .500, and P(A A C) = .200. 


A 3 ye , 
b. P(A/C)= ~~ = oa ~ 400. If we know that the individual came from ethnic group 3, the 


P(AAC) _ 200 


P(A) 447 
blood, the probability that he is from ethnic group 3 is .447. 


probability that he has Type A blood is .40. P(C/A) = = 447. If a person has Type A 


c. Define D =“ethnic group | selected.” We are asked for P(D/B'). From the table, P(DOB’) = .082 + 
106 + .004 = .192 and P(B’) = 1 — P(B) = 1 — [.008 + .018 + .065] = .909. So, the desired probability is 


P(D/B') = PDB’) SERS 5) 
P(B') —-«.909 
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53. 
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Apply the addition rule for three events: P(A U BU C)=.6+ 4+ .2—.3-.15—.1 + .08 =.73. 
P(ANBAC)=P(ANB)—P(ANBOC)=.3—.08 =.22. 


P(B\A) = EOF) a 50 and P(A/B) = iE ce .75 .. Half of students with Visa cards also 
P(A) 6 P(B) 4 


have a MasterCard, while three-quarters of students with a MasterCard also have a Visa card. 


PUAABINC) _ PAQBOC) _ 08 _ 4p 


P(ANB|C)= 
P(C) P(C) we) 

P(AUB|O)= PUBIOC) PAANCwie cD. Use a distributive law: 
P(C) P(C) 

_ P(ANC)+P(BOC)—PCANCIO[BNC)) _ P(ANC)+ P(BNC)-P(ANBOC) _ 
P(C) P(C) 

.15+.1—.08 = 85. 

2 


- 


P(small cup) = .14 + .20 = .34. P(decaf) = .20 + .10 + .10 =.40. 


P(decaf | small) = ee ered = cial .588. 58.8% of all people who purchase a small cup of 
P(small) 34 

coffee choose decaf. 

P(small ~ decaf) _ .20 


=—=.50. 50% of all people who purchase decaf coffee choose 
P(decaf ) 40 


P(small | decaf) = 


the small size. 


Let A = child has a food allergy, and R = child has a history of severe reaction. We are told that P(A) = 
.O8 and P(R | A) = .39. By the multiplication rule, P(A > R) = P(A) * P(R | A) = (.08)(.39) = .0312. 


Let M = the child is allergic to multiple foods. We are told that P(M | A) = .30, and the goal is to find 
P(M). But notice that M is actually a subset of A: you can’t have multiple food allergies without 
having at least one such allergy! So, apply the multiplication rule again: 


P(M) = P(M 1 A) = P(A) * P(M / A) = (.08)(.30) = .024. 


P(AQB) P(B) _ .05 _ 


P(B/A) = ————. = —— = — =.0833 (since B is contained in A, A 4 B= B). 
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Let A = {carries Lyme disease} and B = {carries HGE}. We are told P(A) = .16, P(B) = .10, and 
P(A QB\AUB)=.10. From this last statement and the fact that AMB is contained in AUB, 
0= rene = P(A B)= .10P(A UL B) = .L0[P(A) + P(B)— P(A By] = .10[.10 + .16- P(A NB) => 
P(AUB) 
1.1P(A 0 B) = .026 = P(A A B) = 02364. 
P(AQB) _ .02364 _ 
P(B) 10 


Finally, the desired probability is P(A | B)= 2364. 


P(B | A) > P(B) iff P(B| A) + P(B' | A) > P(B) + P(B'A) iff 1 > P(B) + P(B'A) by Exercise 56 (with the 
letters switched). This holds iff 1 — P(B) > P(B' | A) iff P(B’) > P(B’ | A), QED. 


The required probabilities appear in the tree diagram below. 
4x3 =.12= P(A, 1 B)= P(A)P(B| A) 


35x .6 =.21= P(A, OB) 


se 25x .5=.125 = P(A; 0B) 


B 


a. P(A, O8B)=.21. 
b. By the law of total probability, P(B) = P(A, 9 B) + P(A20 B) + P(A; 7 B) = 455. 


: P(A, OB) 12 21 
c. Using Bayes’ theorem, P(A, |B) = ——-—— = —= =.264 ; P(A2| B) = => = 462 ; P(A3 |B) = 1 - 
ig Baye ore (A,| B) P(B) 455 (A2| B) 455 (A; | 


.264 — 462 = .274. Notice the three probabilities sum to 1. 


The initial (“prior”) probabilities of 0, 1, 2 defectives in the batch are .5, .3, .2. Now, let’s determine the 

probabilities of 0, 1, 2 defectives in the sample based on these three cases. 

e If there are 0 defectives in the batch, clearly there are 0 defectives in the sample. 

P(O def in sample | 0 def in batch) = 1. 

© If there is 1 defective in the batch, the chance it’s discovered in a sample of 2 equals 2/10 = .2, and the 
probability it isn’t discovered is 8/10 = .8. 

P(0 def in sample | | def in batch) = .8, P(1 def in sample | 1 def in batch) = .2. 

e If there are 2 defectives in the batch, the chance both are discovered in a sample of 2 equals 


=x 7 = 022; the chance neither is discovered equals axe =.622 ; and the chance exactly | is 


discovered equals 1 — (.022 + .622) = .356. 
P(O def in sample | 2 def in batch) = .622, P(1 def in sample | 2 def in batch) = .356, 
P(2 def in sample | 2 def in batch) = .022. 
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These calculations are summarized in the tree diagram below. Probabilities at the endpoints are 
intersectional probabilities, e.g. P(2 def in batch > 2 def in sample) = (.2)(.022) = .0044. 


a. Using the tree diagram and Bayes’ rule, 


P(O def in batch | 0 def in sample) = CUTER) = .578 
5+.24+.1244 
P() def in batch | 0 def in sample) = DHE ie 278 
5+.24+.1244 
A 1244 
P(2 def in batch | 0 def in sample) = —————_ = .144 
5+.24+.1244 
b. (0 def in batch | | def in sample) = 0 
P(1 def in batch | | def in sample) = —, = .457 
06 + .0712 
‘ ; .0712 
P(2 def in batch | 1 def in sample) = —————— = .543 
.06 + .0712 


63. 


b. From the top path of the tree diagram, P(A 4 B A C) = (.75)(.9)(.8) = .54. 


ce. Event B > C occurs twice on the diagram: P(B mC) = P(AN BOAC)+ P(A! ABO C)=.54+ 
(.25)(.8)(.7) = .68. 


d. P(C)\=P(ANBOAO+P(A' ABA O+PADB AC) +P(A' OB AC) =.54 + 045 + 14+ 015 =.74. 
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P(ANB ; 
PAO Ne) ee coat 


i itional probability first: P(A | BO C) = 
e. Rewrite the conditional probability (A | C) PBAC) sa 


A tree diagram can help. We know that P(day) = .2, P(i-night) = .5, P(2-night) = .3; also, P(purchase | day) 

=.1, P(purchase | 1-night) = 3, and P(purchase | 2-night) = .2. 

P(day -) purchase) a (.2)(.1) _ 02 
P(purchase) (.2)(.1) +(.5)(.3) +(.3)(.2) —-.23 


Apply Bayes’ rule: e.g., P(day | purchase) = 


Similarly, P(1-night | purchase) = cE .652 and P(2-night | purchase) = .261. 


Let T denote the event that a randomly selected person is, in fact, a terrorist. Apply Bayes’ theorem, using 
P(T) = 1,000/300,000,000 = .0000033: 

P(T)P(+|T) m (.0000033)(.99) 
P(T)P(+|T)+ P(T’)P(+|T') — (.0000033)(.99) + (1 —.0000033)(1 —.999) 
say, roughly 0.3% of all people “flagged” as terrorists would be actual terrorists in this scenario. 


P(T|+)= = ,003289. That is to 


The tree diagram below summarizes the information in the exercise (plus the previous information in 
Exercise 59). Probabilities for the branches corresponding to paying with credit are indicated at the far 
right. (“extra” = “plus”) 


a. P(plus 4 fill \ credit) = (.35)(.6)(.6) = .1260. 

b. P(premium no fill > credit) = (.25)(.5)(.4) = .05. 

c. From the tree diagram, P(premium - credit) = 0625 + .0500 = .1125. 
d. From the tree diagram, P(fill © credit) = 0840 + .1260 + .0625 = .2725. 
e. P(credit) = .0840 + .1400 + .1260 + .0700 + .0625 + .0500 = .5325. 


f.  P(premium | credit) = “@remiumAcredit) _.1125 _ 5114, 


P(credit) 5325 
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Section 2.5 


71. 


To 


79. 


a. Since the events are independent, then A’ and B’ are independent, too. (See the paragraph below 
Equation 2.7.) Thus, P(B’'|A') = P(B’) = 1 —.7=.3. 


b. Using the addition rule, P(A U B) = P(A) + P(B) — P(A J B) =.4 + .7 —(.4)(.7) = .82. Since A and B are 
independent, we are permitted to write P(A > B) = P(A)P(B) = (.4)(.7). 


=.146. 


c. P(AB'|AUB)= AB EN DAB). PORE) ND le 


P(AUB) P(AUB) P(AUB) 82 


From a Venn diagram, P(B) = P(A’ 0 B) + P(A 0B) = P(B) => P(A’ B)=P(B)— P(A OB). If A and B 
are independent, then P(A’ ~ B) = P(B) — P(A)P(B) = [1 — P(A) ]P(B) = P(4‘)P(B). Thus, A’ and B are 
independent. 

P(A' OB) by P(B)-P(ANB) _ P(B)—P(A)P(B) _ 


P(B) P(B) P(B) a 


Alternatively, P(4’| B)= 


Let event £ be the event that an error was signaled incorrectly. 

We want P(at least one signaled incorrectly) = P(E, U ... U E,o). To use independence, we need 
intersections, so apply deMorgan’s law: = P(E; U ...U Ey9) = 1— P(E} O---O Ej.) . P(E") = 1 —.05 = .95, 
so for 10 independent points, P(E/ 0---7 E/,) = (.95)...(.95) = (.95)"*. Finally, P(E) U Ep U ...0 Ey) = 
1 —(.95)'°=.401. Similarly, for 25 points, the desired probability is 1 — (P(E’))** = 1 — (.95)*° = .723. 


Let p denote the probability that a rivet is defective. 

a. .15 =P(seam needs reworking) = 1 — P(seam doesn’t need reworking) = 
1 — P(no rivets are defective) = 1 — P(1" isn’t def © ... > 25™ isn’t def) = 
1—(1=p)...1-p)=1-(1-p)”. 
Solve for p: (1 — p)” =.85 => 1 — p =(.85)'* = p = 1 — 99352 = .00648. 


b. The desired condition is .10 = 1 — (1 — p)”*. Again, solve for p: (1 —p)** = .90 => 
p=1-(.90)'™ = 1 - 99579 = 00421. 


Let A, = older pump fails, Ap = newer pump fails, and x = P(A;  A2). The goal is to find x. From the Venn 
diagram below, P(A;) = .10 +x and P(A2) = .05 + x. Independence implies that x = P(A, A A2) = P(A\)P(A2) 
=(.10 +x)(.05 +x). The resulting quadratic equation, x“ — .85x + .005 = 0, has roots x = .0059 and x = 
.8441. The latter is impossible, since the probabilities in the Venn diagram would then exceed 1. 
Therefore, x = .0059. 
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Using the hints, let P(Aj) = p. and x = P. Following the solution provided in the example, 
P(system lifetime exceeds f) = p + p —p'= 2p" ~p*=2x- x’. Now, set this equal to .99: 


2x— 


2 =.99 > P-2x+ .99=0=>x=0.9 or 1.1 > p = 1.049 or .9487. Since the value we want is a 


probability and cannot exceed 1, the correct answer is p = .9487. 


We’ll need to know P(both detect the defect) = 1 — P(at least one doesn’t) = 1 —.2 = .8. 


P(1* detects > 2" doesn’t) = P(1* detects) — P(1“ does 4 2™ does) = .9-.8=.1. 
Similarly, P(1" doesn’t 4 2™ does) =.1, so P(exactly one does)= .1 + .1=.2. 


P(neither detects a defect) = 1 — [P(both do) + P(exactly 1 does)] = 1 — [.8+.2] = 0. That is, under this 
model there is a 0% probability neither inspector detects a defect. As a result, P(all 3 escape) = 
(0)(0)(0) = 0. 


Let D, = detection on 1" fixation, D, = detection on 2™ fixation. 
P(detection in at most 2 fixations) = P(D,) + P(D! > D,) ; since the fixations are independent, 


P(D,) + P(D! 0 D,) = P(D,) + PCD) P(D2) = p + C1 — p)p = p(2 — Pp). 


Define D,, D2, ...,.D, as ina. Then P(at most n fixations) = 
P(D,) + P(Di ND,) + P(Di A Dy O.D,)+ «..+ PUD AD, OA: DL, OD,)= 


p+(1-pp+(1-pyp+... + (1—p)"'p=pll + (1—p) + (L-py +... + py = 


1—(1—p)" 
-——_*=1-(l-p)’. 

1-(—p) z 
Alternatively, P(at most n fixations) = 1 — P(at least n+1 fixations are required) = 
1 — P(no detection in 1* n fixations) = 1 — P(D/ A Di A+++ Di)=1-(1-p)”. 


P 


P(no detection in 3 fixations) = (1 — py , 


P(passes inspection) = P({not flawed} U {flawed and passes }) 
= P(not flawed) + P(flawed and passes) 
= .9 + P(flawed) P(passes | flawed) = .9 + (.1)(1 -p). 


P(flawed passed) _—_.I(I-p)’ 


Borrowing from d, P(flawed | passed) = = : 
P(passed) 9+.1(1- p) 


. For p=.5, 


es 
P(flawed | passed) = = +) =.0137. 


+.101-.5) 
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87. 

a. Use the information provided and the addition rule: 
P(A, U A>) = P(A)) + P(A2) — P(A, Ad) => P(A, my Ad) = P(A,) + P(A) - P(A, Ao) = 55 + .65 — .80 
= AO. 


b. By definition, P(A,| A,) = ee) = a .5714. Ifa person likes vehicle #3, there’s a 57.14% 
; j 


chance s/he will also like vehicle #2. 


c. No. Fromb, P(A, | A,)=.5714 # P(A2) =.65. Therefore, Az and A; are not independent. Alternatively, 
P(A A A3) = 40 # P(A2)P(A3) = (.65)(.70) = 455. 


P({A, UA, A!) 

P(A) 
There are several ways to calculate the numerator; the simplest approach using the information 
provided is to draw a Venn diagram and observe that P([A, U 4,]A 4!) = P(A, UV A, UV A,)- P(A) = 
33 


£88 — 55 = 33. Hence, P(A, V A, | 4) = == .7333. 


d. The goal is to find P(A, VU A, | 4’), ie. . The denominator is simply 1 — .55 = .45. 


89. The question asks for P(exactly one tag lost | at most one tag lost) = P((C, NC)) U(C) NC,)|(C, NC,)’/). 
Since the first event is contained in (a subset of) the second event, this equals 


PKL ACYAGAG)) _ (GAG) + PG AG) _ PIG )P(G) + P(QYP(G,) by independence = 


P((C,AC,)) 1—P(C, NC,) 1—P(C,)P(C,) 
m(l—z)+(l1—2z)a it 2a(l—7) _ 2a 
1-7? t=2? +x 


Supplementary Exercises 


91. 
a. P(line 1) = de 333; 
1500 
P(ctack) = -50(500)+.44(400)+.40(600) _ 666 _ 44, 


1500 1500 


b. This is one of the percentages provided: P(blemish | line 1) = .15. 


c. P(surface defect) = -10(500) + .08(400) +.15(600) = 172 : 


1500 1500 
10 
P(line 1 > surface defect) = 10(500) = > 
1500 1500 
so, P(line 1 | surface defect) = Oe gs 291. 


172/1500 172 
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93. Apply the addition rule: P(AUB) = P(A) + P(B)— P(A 0 B) => .626 = P(A) + P(B) — .144. Apply 
independence: P(A B) = P(A)P(B) = .144. 
So, P(A) + P(B) = .770 and P(A)P(B) = .144. 
Let x = P(A) and y = P(B). Using the first equation, y = .77 —x, and substituting this into the second 
equation yields x(.77 — x) = .144 or x’ —.77x + .144=0. Use the quadratic formula to solve: 


x= TIENT -AOCIM) = J7e13 32 or .45. Since x = P(A) is assumed to be the larger 


2(1) 2 
probability, x = P(A) = .45 and y = P(B) = .32. 


There are 5! = 120 possible orderings, so P(BCDEF) = 4 = .0833. 


b. The number of orderings in which F is third equals 4x3x1*x2x1 =24 (*because F must be here), so S 
P(F is third) = 24 = .2. Or more simply, since the five friends are ordered completely at random, there i 
; ; “ i ye id 
is a 4 chance F is specifically in position three. 4 
c. Similarly, P(F last) = SRORORINE sae ' 
120 
heh BNE 
d. P(F hasn’t heard after 10 times) = P(not on #1 4 not on #27... noton #10) = BNhiy gt 5 = 
1074. 
97. When three experiments are performed, there are 3 different ways in which detection can occur on exactly 


2 of the experiments: (i) #1 and #2 and not #3; (ii) #1 and not #2 and #3; and (iii) not #1 and #2 and #3. If 
the impurity is present, the probability of exactly 2 detections in three (independent) experiments is 
(.8)(.8)(.2) + (.8.2)(.8) + (.2)(.8)(.8) = .384. Ifthe impurity is absent, the analogous probability is 
3(.1)(.1)(.9) = 027. Thus, applying Bayes’ theorem, P(impurity is present | detected in exactly 2 out of 3) 


_ P(detected in exactly 2 present) _ (.384)(.4) 4 
P(detected in exactly 2) (.384)(.4)+(.027)(.6) 
99, Refer to the tree diagram below. 


a. (pass inspection) = P(pass initially U passes after recrimping) = 
P(pass initially) + P(fails initially 4 goes to recrimping is corrected after recrimping) = 
95 + (.05\(.80)(.60) (following path “bad-good-good” on tree diagram) = .974. 
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P(passed initially) ~ 95 _ 9754. 


b. P(needed no recrimping | passed inspection) = - - 
P(passed inspection) .974 


101. Let A = 1" functions, B = 2™ functions, so P(B) =.9, P(A U B) = .96, P(A A. B)=.75. Use the addition rule: 
P(A U B) = P(A) + P(B)— P(A J B) => .96 = P(A) + .9 —.75 => P(A) =.81. 


Therefore, P(B | A) = ————- = — = .926. 


103. A tree diagram can also help here. 
a. P(E; OL)=P(E,)P(L / E\) =(.40)(.02) = .008. 


b. The law of total probability gives P(L) =}, P(E) P(L / E;) = (.40)(.02) + (.50)(.01) + (.10)(.05) = .018. 


a pen |r)=t spa ry= 1-02). PEPE TAY. , AOC). 
P(L') 1- P(L) 1-.018 


601. 


105. _—‘ This is the famous “Birthday Problem” in probability. 
a. There are 365'° possible lists of birthdays, e.g. (Dec 10, Sep 27, Apr 1, ...). Among those, the number 
with zero matching birthdays is Pjo,36s (sampling ten birthdays without replacement from 365 days. So, 
Posss _ (365)(364)---(356) _ 


= 7 .883. P(at least two the same) = 1 — .883 = .117. 
365 (365) 


P(all different) = 


13 
b. The general formula is P(at least two the same) = | — ast . By trial and error, this probability equals 


.476 for k = 22 and equals .507 for k = 23. Therefore, the smallest k for which k people have at least a 
50-50 chance of a birthday match is 23. 


c. There are 1000 possible 3-digit sequences to end a SS number (000 through 999). Using the idea from 


P 
a, P(at least two have the same SS ending) = 1 — wre = 1— .956 = .044. 


Assuming birthdays and SS endings are independent, P(at least one “coincidence”) = P(birthday 
coincidence U SS coincidence) = .117 + .044 — (.117)(.044) = .156. 


107. | P(detection by the end of the nth glimpse) = 1 — P(not detected in first glimpses) = 


1 = PCG] NG, G,)= 1 ~ PCG) P(G})--P(G,) = 1 ~ 1 — pi (1-2) «(1 — px) = 1 - T= p,). 


109. 


a. (all in correct room) = = =—= .0417. 
4! 24 


b. The 9 outcomes which yield completely incorrect assignments are: 2143, 2341, 2413, 3142, 3412, 
3421, 4123, 4321, and 4312, so P(all incorrect) = = = .375. 
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111. Note: s = 0 means that the very first candidate interviewed is hired. Each entry below is the candidate hired 
for the given policy and outcome. 


Outcome 1 s=2 s=3 | Outcome 


1234 l 4 4 - 3124 3 1 | 4 
1243 ] 3 3 3 3142 a 1 4 2 
1324 I 4 - ~ 3214 3 2 ] 4 
1342 1 2 2 2 3241 3 2 1 1 
1423 1 3 3 3 3412 3 1 1 2 
1432 1 2 2 2 3421 3 2 2 | 
2134 2 l 4 4 4123 J 1 3 3 
2143 2 1 3 3 4132 4 1 2 2 
2314 2 1 1 a 4213 4 2 1 3 
2341 2 ] 1 4231 4 2 1 1 
2413 2 | | 3 4312 a 3 l 2 
2431 2 l 1 | 4321 4 3 2 1 


From the table, we derive the following probability distribution based on s: 


P(hire #1) 
Therefore s = | is the best policy. 


113. P(A;) = P(draw slip 1 or 4) = 4; P(A2) = P(draw slip 2 or 4) = %; 4 
P(A) = P(draw slip 3 or 4) = %4; P(A, 7 Az) = P(draw slip 4) = 4; f 
P(A 0 A3) = P(draw slip 4) = %; P(A, 7 A3) = P(draw slip 4) = Ya, ’ 
Hence P(A, 0 A:) = P(A,)P(A2) = 3 P(A2 0 As) = P(A2)P(A3) = 43 and . 
P(A, 7 A3) = P(A\)P(A3) = Y%. Thus, there exists pairwise independence. However, 
P(A, 0 Az A3) = P(draw slip 4) = 'A# *= P(A,)P(A2)P(A3), so the events are not mutually independent. 
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CHAPTER 3 


Section 3.1 


1. 

S: FFF SFF FSF FFS FSS SFS SSF SSS 
6 0 1 ] l Z 2 2 3 

3; Examples include: M = the difference between the large and the smaller outcome with possible values 0, 1, 
2, 3, 4, or 5; T= 1 if the sum of the two resulting numbers is even and T= 0 otherwise, a Bernoulli random 
variable. See the back of the book for other examples. 

4. No. In the experiment in which a coin is tossed repeatedly until a H results, let Y= 1 if the experiment 
terminates with at most 5 tosses and Y = 0 otherwise. The sample space is infinite, yet Y has only two 
possible values. See the back of the book for another example. 

7. 

a. Possible values of X are 0, 1, 2, ..., 12; discrete. 

b. With n=# on the list, values of Y are 0, 1, 2, ... , N; discrete. 

¢. Possible values of U are 1, 2, 3, 4, ... ; discrete. 

d. Possible values of X are (0, 00) if we assume that a rattlesnake can be arbitrarily short or long; not 
discrete. 

e. Possible values of Z are all possible sales tax percentages for online purchases, but there are only 
finitely-many of these, Since we could list these different percentages {z), 22, ..., Zy}, Z is discrete. 

f. Since 0 is the smallest possible pH and 14 is the largest possible pH, possible values of Y are [0, 14]; 
not discrete. 

g. With m and M denoting the minimum and maximum possible tension, respectively, possible values of 
X are [m, M]; not discrete. 

h. The number of possible tries is 1, 2, 3, ...; each try involves 3 racket spins, so possible values of X are 
3, 6, 9, 12, 15, ...; discrete. 

9. 


a. Returns to 0 can occur only after an even number of tosses, so possible X values are 2, 4, 6, 8, .... 
Because the values of X are enumerable, X is discrete. 


b. Now areturn to 0 is possible after any number of tosses greater than 1, so possible values are 2, 3, 4, 5, 
.... Again, X is discrete. 
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13. 


P(X > 2) = p(2) + p(3) + p(4) = 30+ 15 + 10 =.55, while P(X > 2) =.15 + .10 = .25. 


P( <X<3)=p(1) + p(2) + p(3)= 25 + 30+ .15 =.70. 


Who knows? (This is just a little joke by the author.) 


P(X < 3) =p(0)+p(1) + p(2) + p(B) = -10+.15+.20+.25 = .70. 
P(X < 3) =P(XS 2) = p(O) + p(1) + p(2) = 45. 

P(X = 3) = p(3) + p(4) + p(S) + p(6) = .55. 

P(2< X<5)=p(2) + p(3) + p(4) + pS) =.71. 


The number of lines not in use is 6 —X, and P(2 <6 —X <4) = P(-4 <-X<-2) = 
P(2<X<4)=p(2) + p(3) + p(4) = 65. 


P(6 —X2> 4) = P(X < 2) =.10 + .15 + .20=.45. 


(1,2) (1,3) (1,4) (1,5) (2,3) (2,4) (2,5) (3,4) 3,5) (4,5) 

X can only take on the values 0, 1, 2. p(0) = P(X = 0) = P({(3,4) G,5) (4,5) )) = 3/10 = .3; 

p(2) = P(X = 2) = P({(A,2)}) = W/10 = 1; p(1) = P(X = 1) = 1—-[pO) + p(2)] = .60; and otherwise p(x) 
=0. 


F(0) = P(X < 0) = P(X = 0) = .30; 
F(\) = P(X <1) = P(X = 0 or 1) = .30 + .60 = .90; 
F(2) = P(X <2)=1. 


Therefore, the complete cdf of X is 
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0 x<0 
30 O<x<!l 
90 1sx<2 
| 2<x 


F(x) = 


17. 
a. p(2)=P(Y=2)=P(first 2 batteries are acceptable) = P(AA) = (.9)(.9) = .81. 


b. p(3) =P(Y=3) = P(UAA or AUA) = (1.9) + (.I)C-9) = 2[(.1).9)] = .162. 


c. The fifth battery must be an A, and exactly one of the first four must also be an A. 
Thus, p(5) = P(AUUUA or UAUUA or UUAUA or UUUAA) = 4{(.1)°(.9)] = .00324. 


d. p(y) = P(the y® is an A and so is exactly one of the first y — 1) = (y- 1)(.1)"7(.9), for y = 2, 3, 4, 5, .... 


19, P(O) = P(¥ = 0) = P(both arrive on Wed) = (.3)(.3) = .09; 
p(1) = P(Y¥= 1) = P((W,Th) or (Th,W) or (Th, Th)) = (.3)(.4) + (.4)(.3) + (-4)(-4) = -40; 
p(2) = P(Y = 2) = P((W,F) or (Th,F) or (F,W) or (F,Th) or (F,F)) = .32; 
P(3) = 1 -[.09 + .40 + 32] =.19. 


21. 
a. First, 1 + 1/x>1 forall x = 1, ..., 9, so log(1 + 1/x) > 0. Next, check that the probabilities sum to 1: 


9 9 
>: log,,(1+1/ x)= Vlog (=) = log,, (=| + logig (3) +++-+log,, (2) ; using properties of logs, 
xel rel 


this equals log,, [Fad] = logyo(10) = 1. 


b. Using the formula p(x) = logio(1 + 1/x) gives the following values: p(1) = .301, p(2) = .176, p(3) = 
.125, p(4) = .097, p(5) = .079, p(6) = .067, p(7) = .058, p(8) = .051, p(9) = .046. The distribution 
specified by Benford’s Law is not uniform on these nine digits; rather, lower digits (such as | and 2) 
are much more likely to be the lead digit of a number than higher digits (such as 8 and 9). 

¢. The jumps in F(x) occur at 0, ... , 8. We display the cumulative probabilities here: F(1) = .301, F(2) = 
477, F(3) = .602, F(4) = .699, F(5) = .778, F(6) = .845, F(7) = .903, F(8) = .954, F(9) = 1. So, F(x) = 
0 for x < 1; F(x) = .301 for 1 <x <2; 

F(x) =.477 for 2 <x < 3; etc. 
d. P(X <3) =F(3)=.602; P(X >5)=1-—P(X¥<5)=1-—P(Xs4) =1—F(4) =1—.699 = 301. 


23. 
a. p(2)=P(X=2)=F(3) — F(2) = .39 —.19 =.20. 


b. P(X>3)=1-P(X¥<3)=1-F(3)=1-.67=.33. 
c. P(2<X<5)=F(S)—F(2-1) = FS) — Fl) = .92 — .19 =.78. 


d. P(2<X<5)=P(2<X< 4) = F(4) — F(2) = .92 — .39 =.53. 
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25. p(0) = P(¥=0)= P(B first) = p; 
p(1) = P(¥= 1)= P(G first, then B) = (1 — p)p; 
p(2) = P(Y = 2)=P(GGB) = (1 —p)p; 
Continuing, p(y) = P(y Gs and then a B) = (1 —p)'p for y = 0,1,2,3..... 


27. 
a. The sample space consists of all possible permutations of the four numbers 1, 2, 3, 4 


outcome x value outcome x value outcome x value 


1234 4 
1243 2 
1324 2 
1342 1 
1423 1 
1432 2 
2134 2 
2143 0 


b. From the table in a, p(0) = P(X =0)= =, ,P(1) = P(X = 1) = £, p(2)= P(Y = 2)= £ 
p(3) = P(X = 3) = 0, and p(4) = P(Y=4)= 4. 


Section 3.3 


29. 


E(X) =} xp(x) = 1(.05) + 2(.10) + 4(.35) + 8(.40) + 16(.10) = 6.45 GB. 


allx 


b.  V(X)=}'(x— uy p(x) = (1 — 6.45)(.05) + (2 — 6.45)7(.10) + ... + (16 - 6.45)*(.10) = 15.6475. 


alls 


a= JV(X)=V15.6475 = 3.956 GB. 


? 


d. E(X*)=}' x p(x) = 1°(.05) + 2°(.10) + 4°(.35) + 8°(.40) + 16°(.10) = 57.25. Using the shortcut 


all x 


formula, V(X) = E(X°) — 4° = 57.25 - (6.45) = 15.6475. 


31. From the table i in Exercise 12, E( rie = AS( .O5) + 46(.10) + ... + 55(.01) = 48. 84; similarly, 
E(¥’) = 45°(.05) + 467(.10) + 5°(.01) = 2389.84; thus V(Y) = E(¥" )-[E(MF = 2389.84 - (48.84) = 
4.4944 and oy= /4.4944 = 212. 
One standard deviation from the mean value of Y gives 48.84 + 2.12 = 46.72 to 50.96. So, the probability Y 


is within one standard deviation of its mean value equals P(46.72 < Y < 50.96) = P(Y = 47, 48, 49, 50) = 
12+ 14+ 25+.17 = .68. 
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33. 
1 
a. E(X*)= py + p(x) = (1 —p)+ 1°(p) =p. 
veO 
b. V(X) = E(X’)- [EO =p— [pl =p -p). 
ce. E(X”)=0"%(1 -p) + 1p) =p. In fact, E(X") = p for any non-negative power 7. 
35. Let A3(X) and A,(X) equal the net revenue (sales revenue minus order cost) for 3 and 4 copies purchased, 


respectively. If 3 magazines are ordered ($6 spent), net revenue is $4 — $6 =-$2 if X= 1, 2($4) - $6 = $2 
if X = 2, 3($4) — $6 = $6 if X = 3, and also $6 if ¥ = 4, 5, or 6 (since that additional demand simply isn’t 
met. The values of 44(X) can be deduced similarly. Both distributions are summarized below. 


4 
6 
8 

pt 


5 
6 
8 
3 
15 


6 
6 
8 


15 5 5 


6 
Using the table, E{Ay(X)] = 5° A,(x)- p(x) = (2) + -.. + OC GS) = $4.93. 
xel 


6 
Similarly, E[hg(X)] = 7A, (x)- p(x) = (ANGE) + --- + BN GE) = $5.33. 
r=] 


Therefore, ordering 4 copies gives slightly higher revenue, on the average. 


: ; 2 l ix? 1 uel Hel’ A> 
37. Using the hint, E(Y) = $'x-| — |=— 5. x =—| ——— | = ——. Similarly, 
sing the hint, F(X) ey (*) sa Al 5 5 y 
or er ae ie - l meetin) (n+1)(2n+1) 
BE(X*)= xy—j=-— Y= | = 80 
rte 2 (7) a | 6 6 


V(X)= 


io Mane) (net) _w-l 
6 2 a 


39, From the table, E(X) = ¥. xp(x) = 2.3, E(X*) =6.1, and V(X) = 6.1 — (2.3)° =.81. Each lot weighs 5 lbs, so 
the number of pounds left = 100 — 5X. Thus the expected weight left is £( 100 — 5X) = 100 — SE(X) = 
88.5 Ibs, and the variance of the weight left is (100 — 5X) = V(-SX) = (-S) V(X) = 25 V(X) = 20.25. 

41. Use the hint: W(aX + b)= E[((aX +b)- E(aX +b))] = S’fax+b-E(aX +b)}’ p(x) = 


DY lax +b-(ap+b)f p(x)= Yi lax- ap p(x) = ay (x- ut) p(x) =aV(X). 


43. With a = 1 and b =-c, E(X—c) = E(aX + b) = a E(X) + b= E(X) -c. 
When c = 4, E(X— p) = E(X) — w= w- “= 0; i.e., the expected deviation from the mean is zero. 
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a<X<b means thata <x <b for all x in the range of X. Hence ap(x) Sxp(x) S bp(x) for all x, and 
dap) < > xp(~) < > dp(x) 
a), p(x) s > xp(x) < b> p(x) 


al<E(X)<b-1 
as E(X)<b 


Section 3.4 


47. 


49. 


a. B(4;15,.7)=.001. 

b. (4:15,.7) = B(4;15,.7) — B(3;15,.7) = .001 — .000 = O01. 

c. Nowp=.3 (multiple vehicles). b(6:15,.3) = B(6:15,.3) — B(5;15,.3) = .869 — .722 = 147. 
d. P(2<X<4)=B(4;15,.7) — BCL15,.7) = 001. 

e. P(2<X)=1-P(XSs 1)=1-B(1515,.7)=1- .000 = 1. 


f. The information that 11 accidents involved multiple vehicles is redundant (since n = 15 and x= 4). So, 
this is actually identical to b, and the answer is .001. 


Let X¥ be the number of “seconds,” so X ~ Bin(6, .10). 
t (n) . SANG Pe eos 
a. PE=1)=| Dp (i—p) = fr? (.9)° =.3543. 
es 
7 r , 6 0 6 6 1 5 = 
b. P(X>2)=1-[P(X=0)+ P(X¥=1)] =1- 0 (.1)'(.9)° + ; (.1)'(.9)° |= 1 — [5314 + 3543] = 
1143. 
c. Either 4 or 5 goblets must be selected. 
4 
Select 4 goblets with zero defects: P(X = 0) = (ok 1)°(.9)* = .6561. 
4 
Select 4 goblets, one of which has a defect, and the 5" is gos io cr | x .9 = .26244 
So, the desired probability is .6561 + .26244 = .91854. 


Let X be the number of faxes, so XY ~ Bin(25, .25). 
a. E(X)=np=25(.25) = 6.25. 


b. V(X) =np(I-p) = 25(.25)(.75) = 4.6875, so SD(X) = 2.165. 


c. P(X> 6.25 + 2(2.165)) = P(X> 10.58) = 1 — P(X < 10.58) = 1— P(X < 10) = 1 — B(10;25,.25) = .030. 
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53. Let “success” = has at least one citation and define Y = number of individuals with at least one citation. 
Then X ~ Bin(n = 15, p= .4). 
a. If atleast 10 have no citations (failure), then at most 5 have had at least one (success): 
P(X <5) = B(S;15,.40) = .403. 


b. Half of 15 is 7.5, so less than half means 7 or fewer: P(¥ s 7) = B(7;15,.40) = .787. 
ec P(5<X< 10)=P(X< 10)—P(X< 4) =.991 —.217 =.774. 


55. Let “success” correspond to a telephone that is submitted for service while under warranty and must be 
replaced. Then p = P(success) = P(replaced | submitted)-P(submitted) = (.40)(.20) = .08. Thus X, the 
number among the company’s 10 phones that must be replaced, has a binomial distribution with n = 10 and 

10 
p= .08, so P(X =2) = ( P |cosy* 90 =.1478. 


57. Let X = the number of flashlights that work, and let event B = {battery has acceptable voltage}. 
Then P(flashlight works) = P(both batteries work) = P(B)P(B) = (.9)(.9) = .81. We have assumed here that 
the batteries’ voltage levels are independent. 
Finally, X ~ Bin(10, .81), so P(X > 9) = P(X=9) + P(X = 10) = .285 + .122 = 407. 


59, In this example, XY ~ Bin(25, p) with p unknown. 
a. P(rejecting claim when p = .8) = P(X < 15 when p = .8) = B(15; 25, .8)=.017. 


b. P(not rejecting claim when p = .7) = P(X > 15 when p = .7) = 1 —- P(X < 15 when p =.7) = 
= | — B(15; 25, .7)=1-—.189=.811. 
For p = .6, this probability is = 1 — B(15; 25, .6) = 1 — .575 = .425. 


c. The probability of rejecting the claim when p = .8 becomes B(14; 25, 8) = .006, smaller than in a 
above. However, the probabilities of b above increase to .902 and .586, respectively. So, by changing 
15 to 14, we're making it less likely that we will reject the claim when it’s true (p really is 2 .8), but 
more likely that we'll “fail” to reject the claim when it’s false (p really is < .8). 


61. If topic A is chosen, then n = 2, When n = 2, P(at least half received) = P(X 2 1) = 1 — P(X = 0) = 


1 ¥ (.9)°(.1)? = .99 
9 |ODG 99, 
If topic B is chosen, then n = 4. When n = 4, P(at least half received) = P(X 2 2) = 1 — P(X's 1) = 


i-[(Joren «(laren > 9963. 


Thus topic B should be chosen if p = .9. 


However, if p = .5, then the probabilities are .75 for A and .6875 for B (using the same method as above), 
so now A should be chosen. 


63. 
a. d(x; n, 1 -n-( J0-ar@r ( Jor-a-py = b(n-x; n, p). 


n n 
x n-Xx 
Conceptually, P(x S’s when P(S) = | — p) = P(n—x F’s when P(F) = p), since the two events are 
identical, but the labels S and F are arbitrary and so can be interchanged (if P(S) and P(F) are also 
interchanged), yielding P(n—x S’s when P(S) = 1 — p) as desired. 
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b. Use the conceptual idea from a: B(x; n, 1—p)= P(at most x S’s when P(S) = 1 —p)= 
P(at least n-x F’s when P(F) =p), since these are the same event 
= P(at least n-x S’s when P(S) = p), since the S and F labels are arbitrary 
=| —P(at most n—x—1 S’s when P(S) = p) = 1 — B(n—x-1; 2, p). 


c. Whenever p>.5, (1 —p)<.5 so probabilities involving X can be calculated using the results a and b in 
combination with tables giving probabilities only for p < FoF 


65. 

a. Although there are three payment methods, we are only concerned with S = uses a debit card and F = 
does not use a debit card. Thus we can use the binomial distribution. So, if X = the number of 
customers who use a debit card, X ~ Bin(n = 100, p = .2). From this, 

E(X) = np = 100(.2) = 20, and V(X) = npg = 100(.2)(1-.2) = 16. 


b. With S = doesn’t pay with cash, n= 100 and p = .7, sou = np = 100(.7) = 70, and V = 21. 
67. When n = 20 and p= .5, w= 10 and o= 2.236, so 20 = 4.472 and 30= 6.708. 
The inequality |X — 10| 2 4.472 is satisfied if either X < 5 or X¥ 2 15, or 


P(X- 2 20)=P(XS5 or X> 15) =.021 + .021 = .042. The inequality |X — 10| > 6.708 is satisfied if 
either X< 3 or X> 17, so P(\X— p| 2 30) = P(X<3 or X> 17) =.001 + .001 = .002. 


Section 3.5 


69. According to the problem description, X is hypergeometric with n = 6, N= 12, and M=7. 


12-1 12 12 
P(X> + 0) = P(X> 3.5 + 0.892) = P(X > 4.392) = P(X= 5 or 6) = .121 (from part a). 


b. EW) =n ce. <- 3.5: VX) = (28) Z\-5)- 0.795: o = 0.892. So, 


c. We can approximate the hypergeometric distribution with the binomial if the population size and the 
number of successes are large. Here, n = 15 and M/N = 40/400 = .1, so 
h(x;15, 40, 400) = bQ315, .10). Using this approximation, P(X < 5) = BCS; 15, .10) = 998 from the 
binomial tables. (This agrees with the exact answer to 3 decimal places.) 
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Possible values of X are 5, 6, 7, 8, 9, 10. (In order to have less than 5 of the granite, there would have 
to be more than 10 of the basaltic). X is hypergeometric, with n = 15, N= 20, and M = 10. So, the pmf 


3 


The pmf is also provided in table form below. 


p(x) = h(x; 15, 10, 20) = 


x 5 6 7 8 9 10 
p(x) 0163 =.1354 = 3483) 3483 «1354 ~—.0163 


P(all 10 of one kind or the other) = P(X = 5) + P(X = 10) = .0163 + .0163 = .0326. 


Le wi = 15.10 7.5; V(X) = Gaateae 3) = .9868; o = .9934. 
N 20 20-1 20 20 


Ut o= 7.5 +9934 = (6.5066, 8.4934), so we want P(6.5066 < X < 8.4934). That equals 
P(X =7) + P(X = 8) = 3483 + .3483 = 6966. 


The successes here are the top M = 10 pairs, and a sample of n = 10 pairs is drawn from among the N 
10\f 10 
( x es 
egy" eta 
(n) 


Let X = the number among the top 5 who play east-west. (Now, M = 5.) 
Then P(all of top 5 play the same direction) = P(X = 5) + P(X = 0) = 


Is) Loli), 


+ 
20 20 
10 10 

Generalizing from earlier parts, we now have N = 2n; M = n. The probability distribution of X is 


al 

: x)\n-x 

hypergeometric: p(x) = A(x; n, n, 2n) = (Ps forx =0, 1, ..., n. Also, 
n 


= 20. The probability is therefore A(x; 10, 10, 20) = 


h(5; 10, 5, 20) + h(5; 10, 5, 20) = 


bE HPS A =(2n=0) = (;2).—" 
E(X) = n-s—=5n and V(X) (32=*).» 2n f erm 
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75. Let X =the number of boxes that do not contain a prize until you find 2 prizes. Then X ~ NB(2, .2). 
a. With S=a female child and F =a male child, let X= the number of F’s before the 2” S. Then 


x+2-1 x g ae 
P(X =x) =nb(x; 2, .2) = 5-1 (29° (1-.2)* = @& + 1)C-2)C8Y. 
b. P(4 boxes purchased) = P(2 boxes without prizes) = P(X = 2) = nb(2; 2, 2)=(2+ 1)(.2)°(.8) = 0768. 


c. P(at most 4 boxes purchased) = P(X<2)= ¥nb(x; 2,.8) = .04 + .064 + .0768 = .1808. 


raf) 
us Bey 
d. E(X)= aes! = a> = &. The total number of boxes you expect to buy is 8 + 2= 10. 
p : 


vis This is identical to an experiment in which a single family has children until exactly 6 females have been 
born (since p = .5 for each of the three families). So, 


5 5 on 2 babe 
p(x) = nb(x; 6, 9-(*% Jore-sr-(% Jom Also pu = TP) - A) = 6, notice this is 
p i 


just 2 + 2 + 2, the sum of the expected number of males born to each family. 


Section 3.6 


79. All these solutions are found using the cumulative Poisson table, F(x; «) = F(x; 1). 
a. P(X<5)=F(5; 1) =.999. 


e"l 
! 


b. P(X=2)= = 184. Or, P(X = 2) = F(2; 1) — F(1; 1) = .920 —.736 = .184. 


ec P(2<X<4)=P(X< 4)-P(X¥ 1)=F(4, 1) - FO 1) = .260. 
d. For X Poisson, o =u = 1, so P(X> uw +0) = P(X>2)=1 ~ P(X <2)=1—-—F(2;1)=1 — 920 = .080. 


81. Let X ~ Poisson(u = 20). 
a. P(X<10)=F(10; 20)=.011. 


— 


b. P(X>20)=1-—F(20; 20) = 1—.559 = 441. 


c. P(10<X< 20) = F(20; 20) — F(9; 20) = .559 — .005 = 554; 
P(10 <X < 20) = F(19; 20) — F(10; 20) = 470 —.011 = AS9. 


d. E(X)="=20,s0 0 = 20 =4.472. Therefore, P(u — 20 <X <p + 20) = 


P(20 — 8.944 < X < 20 + 8.944) = P(11.056 <.X < 28.944) = P(X s 28) - P(X s 11)= 
F(28; 20) — F(11; 20) = .966 — .021 = .945. 
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83. The exact distribution of X is binomial with n = 1000 and p = 1/200; we can approximate this distribution 
by the Poisson distribution with 4 = np = 5. 
a. P(S<X<8)=F(8;5)—F(4; 5) =.492. 


b. P(X28)=1-P(XS7) = 1-—F(7; 5) =1—-.867 = .133. 


85. 
e*g° 
a. s=8 whent=1,s0 P(X¥=6)= él = 122; P(X = 6) = 1 — F(5; 8) = .809; and 
P(X > 10) = 1 — F(9; 8) = .283. 
b. ¢=90min= 1.5 hours, so 4 = 12; thus the expected number of arrivals is 12 and the standard deviation 
is o = V12 = 3.464. 
c. t= 2.5 hours implies that 7 = 20. So, P(X > 20) = 1 — F(19; 20) = .530 and 
P(X < 10) = F(10; 20) = .011. 
87. 
a. Fora two hour period the parameter of the distribution is u = at = (4)(2)= 8, 
Bold 
$0 P(X = 10)= 2° = 999. 
10! 
= F ; e*2° 
b. For a 30-minute period, af = (4)(.5) = 2, so P(X =0) = 01 = 135. 
c. The expected value is simply E(X) = at = 2. 
89, In this example, a = rate of occurrence = 1/(mean time between occurrences) = 1/.5 =2. 
a. Fora two-year period, « = at = (2)(2) = 4 loads. 
b. Apply a Poisson model with « = 4: P(X > 5) = 1— P(X s 5) = 1 — F(5; 4) = 1 —.785 = 215. 
c. Fora =2 and the value of ¢ unknown, P(no loads occur during the period of length 1) = 
2 0 
P(X =0) = a . Solve for t: ¢7! < .1 => -2¢< In(.1) => t= 1.1513 years. 
91. 


a. Fora quarter-acre (.25 acre) plot, the mean parameter is « = (80)(.25) = 20, so P(X's 16) = F(16; 20) = 
wel; 


b. The expected number of trees is a-(area) = 80 trees/acre (85,000 acres) = 6,800,000 trees. 
c. The area of the circle is mr? = n(.1)° = .01n = .031416 square miles, which is equivalent to 


.031416(640) = 20.106 acres. Thus X has a Poisson distribution with parameter y = a(20.106) = 
80(20.106) = 1608.5. That is, the pmf of X is the function p(x; 1608.5). 
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No events occur in the time interval (0, t+ Af) if and only if no events occur in (0, f) and no events 
occur in (t, t+ Af). Since it’s assumed the numbers of events in non-overlapping intervals are 
independent (Assumption 3), 

P(no events in (0, t + At)) = P(no events in (0, £)) - P(no events in (f, t+ At)) => 

Pt + At) = Po{t) - P(no events in (t,¢+ At))= Po(t)- [1 - adr— o(At)] by Assumption 2. 


Rewrite a as Po(t + At) = Po(t) — Po(t)[aAt + 0(Ad)], so Po(t + At) — Po(t) = —Po(n)[aAt + o(Ar)] and 
AN =-aP\(t)- AO ares Since “ _+ 0 as At—+ 0 and the left-hand side of the 
t L t 


P. 
equation converges to oo as At— 0, we find that 
t 


dP,(t) 

= —~aP, t). 
tt o(t) 
dF(t)_ a 


dt dt 
probability of zero events in (0, ¢) for a process defined by Assumptions 1-3 is equal to e™.) 


Let P,(t) =e". Then [e“] =-ae™ = -aP,(t), as desired. (This suggests that the 


at k -at k L -at kel 
Similarly, the product rule implies ah 2 ae iB a a2) 
dt k! k! k! 


e“ (at) e *(aty' i 


1 a k—1)t =~aP,(t) + aP,;(t), as desired. 


Supplementary Exercises 


95. 


a. 


We'll find p(1) and p(4) first, since they’re easiest, then p(2). We can then find p(3) by subtracting the 
others from 1. 
p(1) = P(exactly one suit) = P(all 4) + P(all ¥) + P(all #) + P(all #) = 


Slo] 

5 )\ 0 

3 SD 4 = 00198 , since there are 13 4s and 39 other cards. 

(5) 

(13 13\(13)(13 
La2hakai 

p(4)=4- P(24,19,14,14)= 4 AAA BA = 26375. 

(S] 


p(2) = P(all ¥s and 4s, with > one of each) + ... + P(all #s and #s with > one of each) = 


4- Plalle)= 4- 


(4 
4 - P(all ¥s and 4s, with > one of each) = 


6: [P( ¥ and 4 @) + P(2 ¥ and 3 4) + P(3 ¥ and 2 #) + P(4 ¥ and | ¢)] = 


Wie ae] 
. 4)\1 x 2 18,590 + 44,616 
Ae RE A See A Shy es ee A pts bait SRB Roe alG 


mabe.) ‘ ie 2,598,960 
ge OER 


Finally, p(3) = 1 — [p(1) + p(2) + p(4)] = 58835. 


6: |-=.14se2. 
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4 4 
b. = )x- p(x) =3.114; 0° -|2 2-9 |-.ray =.405 => o=.636. 
rel 


rel 


97. 
a. From the description, X ~ Bin(15, .75). So, the pmf of X is b(x; 15, .75). 


b. P(X>10)=1-P(X< 10) = 1 — B(10515, .75) = 1 — .314 = .686. 
c. P(6<X< 10) = B(10; 15, .75) — BS; 15, .75) = .314 —.001 = 313. 
d. 2 =(15)(.75) = 11.75, o7= (15)(.75)(.25) = 2.81. 


e. Requests can all be met if and only if X¥< 10, and 15 —X <8, i.e. iff 7 < X < 10. So, 
P(all requests met) = P(7 < X < 10) = B(10; 15, .75) — B(6; 15, .75) = .310. 


99, Let X = the number of components out of 5 that function, so X ~ Bin(5, .9). Then a 3-out-of 5 system 
works when_X is at least 3, and P(X 2 3) = 1 — P(X <2) = 1 — B(2; 5, .9) = .991. 


101. 
a. X~ Bin(n = 500, p=.005). Since n is large and p is small, X can be approximated by a Poisson 


e 72.5" 


distribution with z = np = 2.5. The approximate pmf of X is p(x; 2.5) = es 
x! 


> 
2 


e*2:5° 
5! 


= 0668. 


b. PGX=5)= 


ec. P(X>5)=1-P(X<4)= 1 —p(4; 2.5) = 1 — 8912 = .1088. 


103. Let ¥ denote the number of tests carried out. 
For n = 3, possible Y values are 1 and 4. P(¥=1)=P(no one has the disease) = (9) = .729 and P(Y = 4) = 
1 — .729 = .271, so E(Y) = (1)(.729) + (4)(.271) = 1.813, as contrasted with the 3 tests necessary without 
group testing. 
For n = 5, possible values of Y are 1 and 6. P(Y¥ = 1) = P(no one has the disease) = (.9)° = .5905, so 
P(Y = 6) = 1 —.5905 = 4095 and E(¥) = (1)(.5905) + (6)(.4095) = 3.0475, less than the 5 tests necessary 


without group testing. 
105. p(2) = P(X = 2) = P(SS) =p’, and p(3) = P(FSS) = (1 — p)p*. 


For x > 4, consider the first x — 3 trials and the last 3 trials separately. To have X = x, it must be the case 
that the last three trials were FSS, and that two-successes-in-a-row was not already seen in the first x — 3 


tries. 

The probability of the first event is simply (1 — pp’. 

The second event occurs if two-in-a-row hadn’t occurred after 2 or 3 or ... or x — 3 tries. The probability of 
this second event equals 1 — [p(2) + p(3) + ... + p(x — 3)]. (For x = 4, the probability in brackets is empty; 
for x = 5, it’s p(2); for x = 6, it’s p(2) + p(3); and so on.) 

Finally, since trials are independent, P(X = x) = (1 — [p(2) + ... + p@—3)))° - p)p’. 


For p = .9, the pmf of X up to x = 8 is shown below. 
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x 2 3 ~ 5 6 t 8 
P(x) 81 081 081 .0154 .0088 .0023 .0010 


So, P(X < 8) = p(2) + .-. + p(8) = .9995. 


107. 
a. Let event A =seed carries single spikelets, and event B = seed produces ears with single spikelets. 
Then P(A 0 B) = P(A) - P(B | A) = (.40)(.29) = .116. 
Next, let X= the number of seeds out of the 10 selected that meet the condition A > B. Then X~ 
10 
Bin(10, .116). So, P(X = 5) = ; Jo 16)°(.884)° = .002857 . 
b. For any one seed, the event of interest is B = seed produces ears with single spikelets. Using the 
law of total probability, P(B) = P(A 9 B)+ P(A’ > B)= (.40)(.29) + (.60)(.26) = .272. 
Next, let Y= the number out of the 10 seeds that meet condition B. Then Y ~ Bin(10, .272). P(Y=5)= 
10 
; Jane ~.272)° = .0767, while 
- 10 y 10-y 
P(Y <5)= Dy (.272)" (1—.272)""” = 041813 + ... + .076719 = 97024. 
yd 
109. 


e 9° 
0! 


a. P(X=0)=F(0;2) or = 0.135. 


b. Let S=an operator who receives no requests. Then the number of operators that receive no requests 
follows a Bin(n = 5, p = .135) distribution. So, P(4 S’s in 5 trials) = b(4; 5, .135) = 


5 
[, }ersrc3s =,00144. 


c. For any non-negative integer x, P(all operators receive exactly x requests) = 


2% 5 —10 45x 
P(first operator receives x) - ... - P(fifth operator receives x) = [p(x; 2) = [=| ia <_ : 
x! x! 
Then, P(all receive the same number) = P(all receive 0 requests) + P(all receive | request) + P(all 
x» -1045x 
receive 2 requests) +... = )—. 
, rey 


111. The number of magazine copies sold is X so long as X is no more than five; otherwise, all five copies are 


sold, So, mathematically, the number sold is min(X, 5), and E[min(x, 5)] = ¥ min(x,5) p(x;4) = Op(0; 4) + 


x=0 


Ip(1; 4) + 2p(2; 4) + 3p(3; 4) + 4p(4; 4) + 3 5p(x34)= 
xed 


“ 4 
1.735 + 5) p(x;4) = 1.735 +5 li Srv = 1.735 + 5[1 — F(4; 4)] = 3.59. 
reS raf) 
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113. 
a. No, since the probability of a “success” is not the same for all tests. 
b. There are four ways exactly three could have positive results. Let D represent those with the disease 
and D’ represent those without the disease. 
Combination Probability 
D D' 
0 3 5 5 
.2)°(.8)° |. 9)*(.19 
(c y( rH) y( | 
=(.32768)(.0729) = .02389 
] 2 5 5 
.2)'(.8)* | 99 (1) 
(Jere {Jore] 
=(.4096)(.0081) = .00332 
2 1 5 5 
.2)°(.8)' | 9)'C." 
(sore fj 
=(.2048)(.00045) = .00009216 
3 0 5 5 
2) (.8)° |: 9)°(." 
(3). y¢ (0) y( | 
=(.0512)(.00001) = .0000005 12 
Adding up the probabilities associated with the four combinations yields 0.0273. 
115. 


a. Notice that p(x; 4), #2) = .5 p(x; 41) + .5 p(x; Ma), where both terms p(x; ;) are Poisson pmfs. Since both 
pmfs are > 0, so is p(x; 4), #2). That verifies the first requirement. 


Next, ¥ plxsti st) = sy P(X; ,) + sy: p(X; 4.) = .5 +.5 = 1, so the second requirement for a pmf is 
xe xa 


2 y= 
met. Therefore, p(x; 4), #2) is a valid pmf. 


b. EX) = Yo x-pxs wy hy) = DS p54) + 5p 44) = S¥ x p(x:14) +.5> x A % ,) = SEX) + 


xe ra) rel) 


SE(X3), where X; ~ Poisson(;). Therefore, E(X) = Sy; + Suz. 
¢. This requires using the variance shortcut. Using the same method as in b, 
E(X*) = Sx? p(x; 4) +5) x" pk x a) = SE(X?7)+.SE(X}). For any Poisson rv, 
x=0 x= 


E(X*) = VX) + [EQO) = + 7, 90 EQ?) = Sa, + Me) + Say + Hy) - 
Finally, V(X) = .S(, + 42) +.5(4, + 42) — [Sin + So)’, which can be simplified to equal Say + Sp 
+ .25(y — 42)’. 


d. Simply replace the weights .5 and .5 with .6 and .4, so p(x; 44), M2) = .6 p(x; 441) + 4 pO Ha). 
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10 10 
117. P(X =) = > P (arm on track iM X=) = YP (x =j| arm on i): p= 
i=} iz] 


10 10 
> P (next seek ati+j+1 ori-—j—-1)+p; = VP jar + p,_,,)p, » Where in the summation we take p, = 0 


i=! 


i=l 
ifk <0 ork> 10. 


119. —_ Using the hint, Yi(x—- wy) pa) >. (x-yy p(x) = >, (ko p(xy=k?o*? > p(x). 


all x xix-plzko xix-zke xlr-ulzko 
The left-hand side is, by definition, o. On the other hand, the summation on the right-hand side represents 
P(\X — 4 2 ko). 
So o >K a’. P(\X— i 2 ko), whence P(\X — 4 2 ka) < Vk. 


121. 


a. Let A; = {voice}, A> = {data}, and X = duration of a call. Then E(X) = E(X|A)P(A;) + E(X1A2)P(A2) = 
3(.75) + 1(.25) = 2.5 minutes. 


b. Let =the number of chips in a cookie. Then E(X) = E(Xi = 1)P(i= 1) + E(X| i = 2)PG= 2) + 


E(X| i= 3)P(i = 3). If X is Poisson, then its mean is the specified 4 — that is, E(X|i) =i+ 1. Therefore, 
E(X) = 2(.20) + 3(.50) + 4(.30) = 3.1 chips. 
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CHAPTER 4 


Section 4.1 


a. The pdf is the straight-line function graphed below on [3, 5]. The function is clearly non-negative; to 
verify its integral equals 1, compute: 
[/ (075x+.2)dx =.0375x° +.2x] = (.0375(5)° +.2(5))—(.0375(3)' +.2(3)) 
= 1.9375 —.9375 = 1 


b. P(X<4)= [ors +.2) dx = .0375x' +2x] = (.0375(4)* +.2(4)) — (.0375(3)? +.2(3)) 
= |.4— 9375 = .4625. Since_X is a continuous rv, P(X < 4) = P(X < 4) = 4625 as well. 


45 
3 


c. PR.5<X<4.5)= [2 (075x-+.2)dx =,0375x? +.2x | oe 


P(4.5 < X) = P(4.5 < X) = f(075x +.2)dx =.0375x° + 2x] =++-=,278125. 


5 
45 


° 
Nn 
boas 
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, f 3N 
P(X> 0)= [,.09375¢4 —x")dx = 09375, 4x = ] =.5. 
\ 7 / 0 
This matches the symmetry of the pdf about x = 0. 


P(-1<X<1)= [/-09375(4- x" )d = 6875. 


5 > 
P(X <—5 orX>.5)=1-—P(-S<X<.5)=1 | _.09375(4 — x° dx = ] — .3672 = .6328. 


o . ae = bey c 3 
1=f f(x)dx = [ kedx =— ph ate: 
~« 0 3 i 3 8 
4 aes - 
124 vA 
02 | 
1 es | | = 4 z fe 
P(0<X<1)- (3+ dx =tx° | =1=.125 
P(L<X<1.5)= [ “gxtde=tx | =1(3)'-1(1) =43 =.296875 


P(X>1.5)=1 [ 3x7dx=itx | =4(2)'-4(1.5) =.578125. 


& 
15 


l I . ; 
fx)= = = —— for .20 <x < 4.25 and = 0 otherwise. 
B-A_ 4.25-.20 4.05 
Distritvution Plot 
os) | | 
| | 
ii re i Sa ‘i ae 
P(X>3)= [ ghydx = 435 = 309 
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url 
ce P-1<X<yr+l= [' aizdx ==4, = .494. (We don’t actually need to know y here, but it’s clearly 
un 


the midpoint of 2.225 mm by symmetry.) 


d. PlasXsatl)=[" pgde=pp=.247. 


a, P(X<s5)= [15e Bl-Dity =.15f'e '“y (after the substitution u = x — 1) 


= ~¢ eal’ =]-e7°= 451. P(X> 5) =1-P(X< 5) = 1-.451 =.549. 


b. P(2<X<5)= [150 d= [150 du =-e"'™ | = 312. 


Section 4.2 


11. 
ie 


a P(Xs y= YS ee 


> 
2 


b. P(.5<X<1)=F(l)-F(5)= =~ = 1895. 


ce P(X>1.5)=1-P(X<1.5)=1-—F(1.5)= 1-=> =.4375, 


d. 5= F(j)=" = iF =2=> f=V2 «1414. 


e fx)=F (x)= for 0 < x < 2, and = 0 otherwise. 


x 2 x l p2 > x i 8 
fi. E(X\= x f(x)dx =| x-=dx=— ae =— 1.333. 
ibe [, 2 2 a 6 6 


4 


2 2s oe 2 2 
g. EX)= [Ox f(xdde =f x? Sax = sh xdx= =| = 2, so V(X) = E(X*) - [EOP = 
8) _ 8 
2-|—| = 52.222, and ox= V.222 = 471. 


h. From g, E(X*) =2. 
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13. 
«k | ae Se ahs k SY ie 
a. 1=] “d= kf x“de=—x | -0-(~\o spa k=3. 
j 23 : I me: 
b. Forx>1, Fx J f(v)dy=f —dy=-y | =-x?+l=1-—. Forx <1, F(x) = 0 since the 
“2 1 y 1 : 
0 x<l 
distribution begins at 1. Put together, F(x) = l 
ae l1sx 
c. P(X>2)=1-F(2)=1-f=4 or .125; 
P(2< X <3) =F(3)—F(2) =(1-$)—(1-$) =.963 —.875 = 088 . 
per 8 _ (3 ie. ee ae le 
d. The mean is E(X) =f (5 \e=| (5 \e=-3s | mine ae = 1.5. Next, 
por)=['e( 3, b= (Seas =0+3-3, 0 V(X) =3 — (1.5) =.75. Finally, the 
1 x 1 x 1 
standard deviation of X is o= V.75 = .866. 
e. P(1.5—.866< X <1.5+.866) = P(.634< X < 2.366) = F(2,.366) — F(.634) = 9245 —0 = .9245. 
15. 


a. Since XY is limited to the interval (0, 1), F(x) = 0 forx < 0 and F(x) =1 for x 2 1. 
For 0<x< 1, 


F(x)=[° f()dy = J 90y*a-y)dy = [ @0y* -90y")dy=10y” -9y"* | =10x" —9x" . 
The graphs of the pdf and cdf of X appear below. 


, / 
| ¥ 4 02 
A 
04 ek eg ee EES 0.0 aoe 
CL 
ao 02 oA 06 0.0 02 04 06 08 10 
. x 


b. F(.5) = 10(.5)’ — 9(.5)'° = .0107. 


ce. P(.25<X<.5) = F(.5)— F(.25) = .0107 - [10(.25)’ — 9(.25)'"] = .0107 — .0000 = .0107. 
Since X is continuous, P(.25 < X< .5) = P(.25 < Xs .5) = 0107. 


d. The 75" percentile is the value of x for which F(x) = .75: 10x° — 9x'° = .75 => x = .9036 using software. 
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i 
a gene. _f 8 _ fi 9 io =asJ0 NO on} _ (ee 
e BX)= J x f(xdr =f x-90x"(1—x)dr = f/(90x" —90x" de = 9x" x | 9-3 = T= 8182. 
Similarly, E(X*) = lg x*+ f(x)dx =f -90x°(1—x)dx =... = .6818, from which V(X) = .6818 - 


(.8182)? = .0124 and oy = .11134. 


f. wtoa=(.7068, 9295). Thus, Pu —o<S X< + o) = F(.9295) — F(.7068) = .8465 — .1602 = .6863, and 
the probability X is more than | standard deviation from its mean value equals 1 — .6863 = 3137, 


17. 
a. To find the (100p)th percentile, set F(x) = p and solve for x: 
x-A 
= =A+(B—A)p. 
fe teat Ba “ee 
b. E(X)= rode = ASF the midpoint of the interval. Also, 
2 2 “ye 
E(X?)= aes , fcom which 1X) = £0") —(£00F = ...= oa Finally, 
ran B-A 
Oy = V(X = Vio 
RB 1 1 x"! B Hah Ga 
c E(X")=[ x". dx = ———— | = —___. 
A B-A B-Ant+l}, (n+1)(B- A) 
19, 
a. P(Xs1)=F(1)=.25[1 + In(4)] = .597. 
b. P(1<X<3)=F(3)-— FC) = .966 —.597 = 369. 
c. Forx<0Qorx>4, the pdf is fx) = 0 since X is restricted to (0, 4). For 0 <x < 4, take the first derivative 
of the cdf: 
F(x)== i+n( *) Ee Beta EN ee 
4 x 4 4 4 
; 1 In(4) 1 1 1 In(4) 1 
x) = F'(x) =—+—— -— In(x) -— x- = ——-- 1 = .3466-.25In 
I (x)= F(x) ae zm) Fala eet n(x) (x) 
21. area) = E(nR’)= [" nr? f(r)dr =[" nr? >(1-(10—r)?)dr =---= hn = 314.79 m’, 
= 9 4 5 


23. With X = temperature in °C, the temperature in °F equals 1.8X + 32, so the mean and standard deviation in 
°F are 1.8uy + 32 = 1.8(120) + 32 = 248°F and |1.8\oy = 1.8(2) = 3.6°F. Notice that the additive constant, 
32, affects the mean but does not affect the standard deviation. 
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Since X is uniform on [0, 360], E(X) = 
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P(Y < 1.8 7+ 32) = P(L8X +32<1.8f +32)=P(Xs f)=.5 since 7 is the median of X. This 
shows that 1.8 77 +32 is the median of Y. 


The 90" percentile for Y equals 1.8n(.9) + 32, where n(.9) is the 90" percentile for X. To see this, P(Y 
< 1.8n(.9) + 32) = P(1.8X + 32 < 1.8n(.9) + 32) = P(X s n(.9)) = 9, since n(.9) is the 90" percentile of 
X. This shows that 1.8n(.9) + 32 is the 90" percentile of Y. 


When Y= aX + b (i.e. a linear transformation of X) and the (100p)th percentile of the X distribution is 
n(p), then the corresponding (100p)th percentile of the Y distribution is a-n(p) + d. This can be 
demonstrated using the same technique as in a and b above. 


180° and oy= = 103.82°. Using the suggested 


Vi2 


0+360_ 360-0 
2 


linear representation of Y, E(Y) = (27/360)uy— 1 = (27/360)(180) - 2 = 0 radians, and oy = (27/360)oy= 
1.814 radians. (In fact, Y is uniform on [—1, 7].) 


Section 4.3 


29. 


31. 


33. 


9838 is found in the 2.1 row and the .04 column of the standard normal table so c = 2.14. 


P(O<Z<c)= .291 > D(c)— O(0) = .2910 > D(c) — .5 = .2910 > O(c) = .7910 > from the standard 
normal table, c = .81. 


P(c<Z)=.121 > 1-P(Z<c)=.121 > 1 — O(c) = .121 = O(c) = .879 > c= 1.17. 


P(-c $ Zc) = O(c) — O(c) = O(c) - (1 — O(c) = 20(c) — 1 = .668 => O(c) = 834 > 
c= 0.97. 


Plc < |Z) = 1 — P(lZ| < c) = 1 —[@(c) — ®(-c)] = 1 — [20(c) - 1] =2 - 2@(c) = .016 > O(c) =.992 > 
c=241. 


By definition, z, satisfies a = P(Z > z,) = 1 — P(Z < Z4) = 1 — (z,), or D(Z,) = 1 — a. 


b. 


(zZ oss) = 1 — .0055 = .9945 > Z0055 = 2.54. 
D(z 99) = .91 => Zo9 = 1.34. 
D(z 663) =337=> 2633 * —A42. 
50—46.8 


£95 
48 — 46.8 


P(X <50)= p(zs \-Pas 1.83) = @(1.83) = .9664. 


P(X 2 48) = (Zz 2 )- P(Z> 0.69) = 1— (0,69) = 1 — .7549 = .2451. 


The mean and standard deviation aren’t important here. The probability a normal random variable is 
within 1,5 standard deviations of its mean equals P(—1.5 < Z< 1.5) = 
(1.5) —- D(—1.5) = .9332 — .0668 = .8664. 
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37. 


39: 


41, 


Chapter 4: Continuous Random Variables and Probability Distributions 


a. P(X210)=P(Z2> .43)= 1 — (43) = 1 — .6664 = 3336. 
Since X is continuous, P(X > 10) = P(X 2 10) = .3336. 


b. P(X>20)=P(Z>4) <0. 


ec. P(S<X< 10)=P(-1.36 < Z< 43) = 0(.43) — D(—1.36) = .6664 — .0869 = .5795. 


d. P(8.8—c<X<88 +c) =.98, so 8.8 —cand 8.8 + c are at the 1" and the gg" percentile of the given 
distribution, respectively. The 99" percentile of the standard normal distribution satisfies ®(z) = .99, 
which corresponds to z = 2.33. 

So, 8.8 + c= wt 2.330= 8.8 + 2.33(2.8) > c= 2.33(2.8) = 6.524. 


e. Froma, P(X> 10) =.3336, so P(X < 10) =1 —.3336 = .6664. For four independent selections, 
P(at least one diameter exceeds 10) = | — P(none of the four exceeds 10) = 
1 — P(first doesn’t  ... fourth doesn’t) = 1 — (.6664)(.6664)(.6664)(.6664) by independence = 
1 — (.6664)* = 8028. 


a. P(X=105)=0, since the normal distribution is continuous; 
P(X < 105) = P(Z < 0.2) = P(Z < 0.2) = (0.2) = .5793; 
P(X < 105) = .5793 as well, since X is continuous. 


b. No, the answer does not depend on p or c. For any normal rv, P(|\X — p| > 6) = P(|Z| > 1) = 
P(Z<-—1 or Z> 1) =2P(Z <-1) by symmetry = 2(—1) = 2(.1587) = 3174. 


¢. From the table, @(z) = .1% = .001 => z=-3.09 => x = 104 —3.09(5) = 88.55 mmol/L. The smallest 
.1% of chloride concentration values are those less than 88.55 mmol/L 


= 30 mm, o = 7.8 mm 
a. P(X <20)=P(Z<-1.28) = .1003. Since X is continuous, P(X < 20) = .1003 as well. 


b. Set @(z) =.75 to find z ~ 0.67. That is, 0.67 is roughly the 75" percentile of a standard normal 
distribution. Thus, the 75" percentile of X’s distribution is « + 0.67¢ = 30 + 0.67(7.8) = 35.226 mm. 


ce. Similarly, O(z) = .15 => z > —1.04 => (.15) = 30 — 1.04(7.8) = 21.888 mm. 

d. The values in question are the 10" and 90" percentiles of the distribution (in order to have 80% in the 
middle). Mimicking b and ¢, ®(z) = .1 > z=~1.28 & M(z) = .9 > z= +1.28, so the 10" and 90" 
percentiles are 30 + 1.28(7.8) = 20.016 mm and 39.984 mm. 

: 100—200 es 
For a single drop, P(damage) = P(X < 100) = P| Z< Tage = P(Z < -3.33) = .0004. So, the 


probability of no damage on any single drop is 1 — .0004 = .9996, and 
P(at least one among five is damaged) = 1 — P(none damaged) = | — (.9996)° = 1 — .998 = .002. 
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43. 


45. 


47. 


49. 


51. 


Chapter 4: Continuous Random Variables and Probability Distributions 


a. Lets ando denote the unknown mean and standard deviation. The given information provides 


i = 
05 = P(X<39.12)= of 2H) ie = —1,645 = 39.12 ps = -1.6450 and 
oO oO 


10 = P(X> 73.24) = 1- of B2=#). Bae = ©"! (9) 1.28 => 73.24-p=1.280. 
Co oO 


Subtract the top equation from the bottom one to get 34.12 =2.925¢, or o = 11.665 mph. Then, 
substitute back into either equation to get u ~ 58.309 mph. 


b. P(50<X<65) = 0(.57) — O(-.72) = 7157 — 2358 = 4799. 
c. P(X>70)=1— (1.00) = 1 —.8413 = .1587. 


With j= .500 inches, the acceptable range for the diameter is between .496 and .504 inches, so 

unacceptable bearings will have diameters smaller than .496 or larger than .504. 

The new distribution has 1: = .499 and o =.002. 

zs 496-.499 ) 5 Pz Z 504 —.499 
.002 .002 

@(-1.5) + [1 — 0(2.5)] = .073. 7.3% of the bearings will be unacceptable. 


P(X < 496 or X>.504) = rz }- P(Z<~—1.5) + P(Z>2.5)= 


The stated condition implies that 99% of the area under the normal curve with 1 = 12 and o = 3.5 isto the 
left of c— 1, s0 c— 1 is the 99" percentile of the distribution. Since the 99" percentile of the standard 
normal distribution is z = 2.33, e— 1 = + 2.330 = 20.155, and c= 21.155. 


4000 —3432 
482 


P(x > 4000) = PZ > J=P(z>1.18) = 1 (1.18) =1-.8810 = .1190; 


P(3000 < X < 4000) = (— oe 0008?) = (1.18) — @(-.90) = 8810 —.1841 = .6969. 


b. P(X <2000 or X > 5000) = p 2< 2), P(z » Soe | 
482 482 
= 0(-2.97)+ [1 — (3.25)] =.0015 +.0006 = .0021. 


c. We will use the conversion | Ib = 454 g, then 7 Ibs = 3178 grams, and we wish to find 


P(X >3178)= (ze 


|- 1-@(-.53) =.7019. 

d. We need the top .0005 and the bottom .0005 of the distribution. Using the z table, both .9995 and 
.0005 have multiple z values, so we will use a middle value, +3.295. Then 3432 + 3.295(482) = 1844 
and 5020. The most extreme .1% of all birth weights are less than 1844 g and more than 5020 g. 


e. Converting to pounds yields a mean of 7.5595 Ibs and a standard deviation of 1.0608 Ibs. Then 
7-7.5595 : : 
P(X >7)= Az > | =1-(-.53) =.7019. This yields the same answer as in part ¢. 


P(X ~ p| 26) = 1 — P(\X-p| <0) =1-P(u-o0 <X¥<pto)=1-PR-1<Z<1)= 3174. 
Similarly, P(\X — | > 20) = 1 — P(-2< Z< 2) = .0456 and P(X — p| 2 3a) = 0026. 
These are considerably less than the bounds 1, .25, and .11 given by Chebyshev. 
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55. 


$7. 


Chapter 4: Continuous Random Variables and Probability Distributions 


p= 5=>p=125&0' = 6.25; p=.6 > p= 15 & o = 6; p= .8 => p= 20 and o = 4. These mean and 
standard deviation values are used for the normal calculations below. 


a. For the binomial calculation, P(15 < X < 20) = B(20; 25, p) — B(14; 25, p). 
p__P(ISSX<20) P(14.5 < Normal < 20.5) 


Pa =.212 =P(.80<Z<3.20) =.2112 
6 =.577 =P(-20<5Z<2.24) = .5668 
8 =.573 = P(-2.75<Zs .25) =.5957 


b. For the binomial calculation, P(X < 15) = B(15; 25, p). 
P(X< 15) P(Normal < 15.5) 


5 =.885 =P(Z< 1.20) =.8849 
6 =575 =P(ZS.20) =.5793 
8 =.017 =P(Z<-2.25) =.0122 


c. For the binomial calculation, P(X > 20) = 1 — B(19; 25, p). 
p___P(X=20) P(Normal 2 19.5) 

002 =P(Z>2.80) =.0026 

029 =P(Z> 1.84) =.0329 

617 =P(Z>—-0.25) =.5987 


3 
6 
8 


Use the normal approximation to the binomial, with a continuity correction. With p = .75 and n = 500, 
= np = 375, and a = 9.68. So, Bin(500, .75) = N(375, 9.68). 
a. P(360<X < 400) = P(359.5 < X < 400.5) = P(-1.60 < Z< 2.58) = 0(2.58) — B(—1.60) = .9409. 


b. P(X < 400) = P(X < 399.5) = P(Z < 2.53) = 0(2.53) = .9943. 


a. Foranya>0, F,(y)=P(Y <y)=P(aX +b<y)= (x st), (==). This, in turn, implies 


AO)= Shae Fs (= x) +4, (2=4), 


a a a 
Now let : as a cat distribution. Applying this rule, 


2 
fy(y)= ee? a ee) Sezer), This is the pdf of a normal 
distribution. In particular, from the exponent we can read that the mean of Y is E(Y) = au + b and the 
variance of Y is V(Y) = a’o’. These match the usual rescaling formulas for mean and variance. (The 
same result holds when a < 0.) 


b. Temperature in °F would also be normal, with a mean of 1.8(115) + 32 = 239°F and a variance of 
1.872? = 12.96 (i.e., a standard deviation of 3.6°F). 
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Section 4.4 

59. 
a pag= +1. 
b c= = 2 


. PXS4=1-eO™ =1-e =.982. 


d. P(2SX<5)= (a—e™)-1-™) =e” —¢e* =.129. 


61. Note that a mean value of 2.725 for the exponential distribution implies A = 5 = . Let X denote the 


duration ofa rainfall event. 
a. P(X22)=1—P(X<2)=1-PXS2)= 1—F(2;)=1-[1- g(V2.725X2)) = ¢ 2? = 4800; 
P(X <3)=F(3;) = 1-e (2.7253) — 6674: P(2 < X <3) =.6674 — .4800 = .1874. 


b. For this exponential distribution, o =“ = 2.725, so P(X > w+ 20)= 
P(X> 2.725 + 2(2.725)) = P(X> 8.175) = 1 — F(8.175; 4) = ee EN = 3 = 0498. 
On the other hand, P(Y < 4“ — 9) = P(X < 2.725 — 2.725) = P(X <0) = 0, since an exponential random 
variable is non-negative. 


63. 
a. Ifacustomer’s calls are typically short, the first calling plan makes more sense. If a customer’s calls 
are somewhat longer, then the second plan makes more sense, viz. 99¢ is less than 20min(10¢/min) = 
§2 for the first 20 minutes under the first (flat-rate) plan. 
b. A(X) = 10X, while h(X) = 99 for X< 20 and 99 + 10(X — 20) for X¥ > 20. With y = 1/A for the 
exponential distribution, it’s obvious that E{h,(X)] = 10E[X] = 10u. On the other hand, 
E[h(X)] = 99 + 10f. (x-20)Ae* dx = 99 + em =99 + 10ne°". 
0 

When u = 10, E[hy(X)] = 100¢ = $1.00 while E{h(X)] = 99 + 100e° = $1.13. 
When ps = 15, Ef/y(X)] = 150¢ = $1.50 while E[h2(X)] = 99 + 150e*? = $1.39. 
As predicted, the first plan is better when expected call length is lower, and the second plan is better 
when expected call length is somewhat higher. 

65. 


a. From the mean and sd equations for the gamma distribution, af = 37.5 and af =(21.6)2= 466.56. 
Take the quotient to get 2 = 466.56/37.5 = 12.4416. Then, a= 37.5/B = 37.5/12.4416 = 3.01408... ._ 


b. P(X> 50) =1-—P(X¥<50)=1- F(50/12.4416; 3.014) = 1 — F(4.0187; 3.014). If we approximate this 
by 1 — F(4; 3), Table A.4 gives 1 — .762 = .238. Software gives the more precise answer of .237. 


c. P(50<X < 75) = F(75/12.4416; 3.014) — F(50/12.4416; 3.014) = F(6.026; 3.014) — F(4.0187; 3.0 14y — 
F(6; 3) — F(4; 3) = 938 — 762 = .176. ‘ 
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69. 


71. 


Chapter 4: Continuous Random Variables and Probability Distributions 


Notice that = 24 and o = 144 => af = 24 and af? = 144-9 B=" P= Gand a= =a. 


b. 


P(12 $< X $24) = F(4; 4) — FQ; 4) = 424. 


P(X < 24) = F(4; 4) = .567, so while the mean is 24, the median is less than 24, since P(X < 71) =.5. 
This is a result of the positive skew of the gamma distribution. 


We want a value x for which (2a = (Za) = .99. In Table A.4, we see F(10; 4) = .990. So x/6 = 


10, and the 99" percentile is 6(10) = 60. 
We want a value ¢ for which P(X > f) = .005, ie. P(X < t) = .005. The left-hand side is the cdf of X, so 
we really want FEA) =,995. In Table A.4, F(11; 4)=.995, so #/6 = 11 and t= 6(11) = 66. At 66 


weeks, only .5% of all transistors would still be operating. 


{X= t} = {the lifetime of the system is at least t}. Since the components are connected in series, this 
equals {all 5 lifetimes are at least t} = A; Az 0 A3 MN Ag As. 


Since the events A; are assumed to be independent, P(X = t) = P(A; (\ A2 Ax A Ag OAs) = P(A)) - 
P(A2) + P(A;) - P(Aq) « P(As). Using the exponential cdf, for any i we have 

P(A,) = P(component lifetime is > 2) = 1—F(t)=1-—[1-eJ=e°". 

Therefore, P(X 21) =(e°") + (6°) =e, and F(t) = P(X < 1) = 1-6" 

Taking the derivative, the pdf of X is fd) = 05e°°* for t>0. Thus ¥ also has an exponential 
distribution, but with parameter ( = .05. 


By the same reasoning, P(X < t) = | —e"”’, so X has an exponential distribution with parameter /. 


{X sy} = (-JysxXsJy}. 


Fy) = Psy) = P(X? sy) = P(-Jy $X sy) = Cepee te To find the péf of Y, use the 


identity ar: s rule): 


er”, Aye l era Icyy) 


AO)= Te aa # 
-y/2, er —aae pte? 


“i aa ae 


This is valid for y > 0. We recognize this as the chi-squared pdf with v= 1. 
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Section 4.5 


73. 
a. P(X < 250) = F(250; 2.5, 200) = j—e emmy? =e = 8257. 
P(X < 250) = P(X < 250) = .8257. 
P(X > 300) = 1 — F(300; 2.5, 200) = et” = 0636. 


b. P(100 <.X< 250) = F(250; 2.5, 200) — F(100; 2.5, 200) = 8257 —.162 = .6637. 


—(pi200y" a 


c. The question is asking for the median, ji . Solve F( jt) =-5: S=l-e 
eta) = 5 => (jx/200)** =—In(-5)> ji = 200(An(.5))'7* = 172.727 hours. 


va a a-1 _ts 
75. Using the substitution y = () = +_. Then dy = ox _ dx, and w= [ix . —x*"e (at dx = 
B) P B* tiaP 


1 


) by definition of the gamma function. 
a 


{ (B*y)' @.edy= pl ye ay = B é r{ + 


77. 
a. E(X)=e"7? =e" =123.97. 
v(x) =(e** )-(e* -1) = 13,776.53 > o =117.373. 
b. P(X <100)= o( mares) = (0.13) =.5517. 
c. P(X 2200)=1— P(X < 200)=1 -o( mam.=45) =1—@(1.00) =1-.8413 = .1587. Since X is continuous, 
P(X > 200) = .1587 as well. 
79. Notice that yy and oy are the mean and standard deviation of the lognormal variable X in this example; they 


are not the parameters 4 and o@ which usually refer to the mean and standard deviation of In(X). We're given 


fy = 10,281 and ox/Lx = 40, from which oy = 40x = 4112.4. 


a. To find the mean and standard deviation of In(X), set the lognormal mean and variance equal to the 
appropriate quantities: 10,281 = EU) = orl? and (4112.47 = VX) = (e* —1). Square the first 


equation: (10,281 y =e +e Now divide the variance by this amount: 
4 : 2 Io (pF _ : 
caui2ay? _ e* (F<), 9e* -1 = (.40)? =.16 => 0 = yin(1.16) = 38525 
(10,281)° Pd 
That’s the standard deviation of In(X). Use this in the formula for E(X) to solve for pu: 
10,281 = e389"? =e"! => = 9.164. That's E(in(X)). 


in(15,000) —9.164 


b. P(X< 15,000) = fz < 
38525 


= P(Z< 1.17) = (1.17) = .8790. 
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83. 


d. 


Chapter 4: Continuous Random Variables and Probability Distributions 


In(10,281) —9.164 
38525 


though the normal distribution is symmetric, the lognormal distribution is not a symmetric distribution. 
(See the lognormal graphs in the textbook.) So, the mean and the median of X aren’t the same and, in 
particular, the probability X exceeds its own mean doesn’t equal .5. 


P(X2 pty) = P(X 10,281) = (ze )-Pae 19) = 1 — (0.19) = .4247. Even 


One way to check is to determine whether P(Y < 17,000) = .95; this would mean 17,000 is indeed the 


Tg |= 41.50) = 9332, s0 
38525 


17,000 is not the 95" percentile of this distribution (it’s the 93.32%ile). 


95" percentile. However, we find that P(X < 17,000) = of 


V(X) = 779)" (e _ 1) = 3.96 => SD(X) = 1.99 months. 


ma) 
P(X> 12)=1-P(X< 12) = 1- P| Z<———=— |= 1 — 01.78) = 0375. 
( ) P( ) 06 (1.78) 


The mean of X is E(X) = e*°**°? = 8.00 months, so P(x — oy < X < uy + oy) = P(6.01 <X< 9,99) = 


In(9.99) — 2.05 In(6.01) —2.05 
| ————_ |- ®| ————_ |= (1.03) — ®(-1.05) = .8485 — 1469 = .7016. 
( v.06 ( v.06 a, ' ’ 


me In(x) -— 2.05 a ere 
5=F(x)= 0 => ———_ = D (.5) = 0 = In(x) — 2.05 = 0 => the median is given 
(x) ( res tne (.5) (x) g 


by x =e” = 7.77 months. 


In(7) 9) ~ 2.05 


Similarly, = ©'(.99) = 2.33 => 99 = e* = 13.75 months, 


The probability of exceeding 8 months is PLY > 8) = 1 — of BO 2%) = 1 — (.12) = 4522, so the 


expected number that will exceed 8 months out of m = 10 is just 10(.4522) = 4.522. 


Since the standard beta distribution lies on (0, 1), the point of symmetry must be 4, so we require that 
f (4-4) = (4+ «). Cancelling out the constants, this implies 


(+- uy (4+ uy’ _ (4+) (4- a , which (by matching exponents on both sides) in turn implies 
that a= f. 
Alternatively, symmetry about 2 requires 4 = 2, so 5 = .5. Solving for a gives a = f. 

a 
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85. 
a. Notice from the definition of the standard beta pdf that, since a pdf must integrate to 1, 
1 =f he) ix (1-x)’ ‘dx => fx (-xy ax = F(a)r(4) 
or(a)r(f) 0 C(a+f) 
Using this, E(X) = [ix Tat) os (1 = xf" dx= RAbi i My x" (1 —x y’ 1 dx= 
» T(@)r(s) r(a)r (A)? 
r(a+g) P(a+ir(s) __at(@)_ __T(a+#) ___a 
F(ayr(#) r(@+i+p) T(a)r(4) (@+A)F(@ +B) at+B 
b. Similarly, £[(1 —X)"]= [ (1-x)" T(a+B) \(1-x)" de= 
6 T(a)r(f) 
Ma T(a+B) fe a on aa T(a+f) C(a)r(m +B) [lat B)-T (m+) 
T(a)I(B)?° ; P(a)r(B) P(a+m+Z) P(a+m+f)r(A) 
If X represents the proportion of a substance consisting of an ingredient, then | — X represents the 
proportion not consisting of this ingredient. For m = 1 above, 
B(\-X)= T(a+f)-T(1+6)__ T(a+8)- AB) ___ 8 
P(a@+i+A)0(f) (at BY (a+B)T(B) a+B 
Section 4.6 
87. The given probability plot is quite linear, and thus it is quite plausible that the tension distribution is 
normal. 
89. The plot below shows the (observation, z percentile) pairs provided. Yes, we would feel comfortable using 
a normal probability model for this variable, because the normal probability plot exhibits a strong, linear 
pattern. : 


Observation 


zpercentile 
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91, The (z percentile, observation) pairs are (—1.66, .736), (—1.32, .863), (-1.01, .865), 
(—.78, .913), (-.58, .915), (—.40, .937), (—.24, .983), (—.08, 1.007), (.08, 1.011), (.24, 1.064), (.40, 1.109), 
(.58, 1.132), (.78, 1.140), (1.01, 1.153), (1.32, 1.253), (1.86, 1.394). 


The accompanying probability plot is straight, suggesting that an assumption of population normality is 
plausible. 


obsvn 


93. To check for plausibility of a lognormal population distribution for the rainfall data of Exercise 81 in 
Chapter |, take the natural logs and construct a normal probability plot. This plot and a normal probability 
plot for the original data appear below. Clearly the log transformation gives quite a straight plot, so 
lognormality is plausible. The curvature in the plot for the original data implies a positively skewed 
population distribution — like the lognormal distribution. 


3000 ~f = ihe = = Hie ° 
| «ef 
. 7 a 
6 1 caaent 
2000 a - 
; $5 sar 
3 . i, | : 
= 4 | 
H i - | 
1000 4 . 4 1 
. 24 e 
ae? | i | 
Pods tt — 
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Chapter 4: Continuous Random Variables and Probability Distributions 


95. The pattern in the plot (below, generated by Minitab) is reasonably linear. By visual inspection alone, it is 
plausible that strength is normally distributed. 


Normal Probability Plot 


Probability 
g 
. 


strength 
134.902 Andersan-Darling Normality Test 
StDev: 4.54186 AcSquared: 1.065 
N: 153 P-Value: 0,008 


97. The (100p)" percentile (p) for the exponential distribution with 4 = 1 is given by the formula 
mp) =—In(l —p). With n = 16, we need 1p) for p = 95 15, die, These are .032, .398, .170, .247, .330, 


421, 521, .633, .758, .901, 1.068, 1.269, 1.520, 1.856, 2.367, 3.466. 


The accompanying plot of (percentile, failure time value) pairs exhibits substantial curvature, casting doubt 
on the assumption of an exponential population distribution. 


failtime 


0.0 05 1.0 15 2.0 25 3.0 35 
percentile 


Because 2 is a scale parameter (as is o for the normal family), 4 = 1 can be used to assess the plausibility of 
the entire exponential family. If we used a different value of A to find the percentiles, the slope of the graph 
would change, but not its linearity (or lack thereof). 
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Supplementary Exercises 


99, 
a. ForO0<ys 25, F(y) = ; watt lide bf || 22s Ths 
240 12) 24\ 2 36)|, 48 864 
0 y<O0 
2 3 | 
F(y)=i" _* g<ysi2 
48 864 
1 y>l2 
b. P(Y <4)=F(4)=.259. P(Y> 6) = 1—F(6)=.5. 
P(4<X< 6) = F(6) — F(4) = 5 — .259 =.241. 
aii y 1 ef, »y ay 0 ee wig 
ce. E(Y)= j y—yl] 1- ldv =—[ | y? -— ldy =—] —-—] =6 inches. 
0° 24° 12)” «24eel” 12) «24, 3 48], 
y] e i 
E(¥") = = iG = |y=43.2 , 80 V(Y) = 43.2 — 36 = 7.2. 
24eo(" 12) ° 
d. P(¥<4orY>8)=1-—P(4< Y<8)=1-[F(8) — F(4)] =.518. 
e. The shorter segment has length equal to min(Y, 12 — Y), and 
12 6 
E{min(¥, 12 — ¥Y)] = [mincy.12 -y): f(y)dy = [ming.12 —y)-f(y)dy 
0 ) 
12 6 12 90 é 
+ [ min(y,12-y)- f)dy = y f(yidv+ [12 — y)- f(y)dy = —=3.75inches. 
6 0 6 24 
101. 


x Osx<l 


a. By differentiation, (x) = on 2 l<x< Ks 
4 4 3 


0 otherwise 


b. PCS <Xs2)=FQ)-F(S)= 1-3{7-22-3.2)-C) i 917 
23 “N44 sp 
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: Te SARE ie fee 3he 
c. Using the pdf from a, E(X) = foxx dx+f (2-35) de =F =1213. 


- \ 
a roro ay -1 of 322). 


3g 1 — @(—1.38) = 1 — .0838 = .9162. 


b. With Y=the number among ten that contain more than 135 oz, Y ~ Bin( 10, .9162). 
So, P(Y 2 8) = b(8; 10, .9162) + (9; 10, .9162) + b(10; 10, 9162) =.9549. 


c. We want P(X> 135) =.95, ie. 1 -o( 35-1772) = 95 or of B1372 | = 05. From the standard 
oO oO 


normal table, 135-1372 __) 65=5 0 =1.33. 
oO 


105. Let A = the cork is acceptable and B = the first machine was used. The goal is to find P(B / A), which can 
be obtained from Bayes’ rule: 
P(B)P(A|B) f .6P(A| B) 
P(B)P(A| B) + P(B')P(A| B’) ~ 6P(A | B)+.4P(A| B’) : 
From Exercise 38, P(A / B) = P(machine | produces an acceptable cork) = 6826 and P(A | B’) = P(machine 
2 produces an acceptable cork) = .9987. Therefore, 
.6(.6826) 


P(B| A)= 


P(B| A= 5062. 
.6(.6826) + .4(.9987) 
107. 

a. i a 

a — eS = - =t 
3 

°. we | 
on 2% | 
Se sar Eee age RTT | = a 


b. F(x) =0 forx <—I, and F(x) = 1 forx>2. For—-1 <x $2, F(x)= [34 —y*)dy wet . This is 


graphed below. 


“Th 
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Chapter 4: Continuous Random Variables and Probability Distributions 


ec. The median is 0 iff F(0) = .5. Since F(0) = a, this is not the case. Because i< .5, the median must 


be greater than 0, (Looking at the pdf in a it’s clear that the line x = 0 does not evenly divide the 
distribution, and that such a line must lie to the right of x = 0.) 


d. Y is a binomial rv, with m = 10 and p= P(X>1)=1-—F(1) = 4. 


Below, exp(w) is alternative notation for e". 
a. P(X < 150) =| exp O51) expf-en010) = exp(—1l) =.368, 
P(X < 300) = exp[—exp(—1.6667)] =.828, and P1150 < X< 300) = .828 — .368 = .460. 


b. The desired value c is the 90" percentile, so c satisfies 

a -(¢—150) 

I= - —_—___——_— 
ex exp 0 

= In{—In(.9)] = -2.250367, so c = 90(2.250367) + 150 = 352.53. 


] . Taking the natural log of each side twice in succession yields ee mu) 


ec. Use the chain rule: f(x) = F"(x) = on -em( AEA) -eo[ EAI) = 


B) B 
snl (5a 


d. We wish the value of x for which f(x) is a maximum; from calculus, this is the same as the value of x 


for which In[/(x)] is a maximum, and In[/(x)] = —In B-e 7” _G-a) . The derivative of In[/(x)] is 


Slinger" C=). o¢tet Jets = ; set this equal to 0 and we get e“*”” =1, so 
ax 


ee =(, which implies that x = a Thus the mode is @. 


e. E(X)=.57728+ a= 201.95, whereas the mode is a = 150 and the median is the solution to F(x) = .5. 
From b, this equals —-90In[—In(.5)] + 150 = 182.99. 
Since mode < median < mean, the distribution is positively skewed. A plot of the pdf appears below. 
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e. 


e. 


Chapter 4: Continuous Random Variables and Probability Distributions 


From a graph of the normal pdf or by differentiation of the pdf, x* = wa 


No; the density function has constant height for A < x < B. 


fcc) is largest for x = 0 (the derivative at 0 does not exist since fis not continuous there), so x* = 0. 


in f (xa, 8)] = —In(B*)-in(P'(a)) + (a -1)In(a)- and <i (x08) i oe Setting this 
equal to 0 gives the mode: x* = (a — 1). 


The chi-squared distribution is the gamma distribution with a = v/2 and f# = 2. From d, 


wt-(4-1]@)=-2 


«e 2 ” ” . l- 
E(X) = [, x-[ phe™ +(1- p)ae™ |dx = Pf, xAe “dx +(1 -p)f, xA,e""dx sas ve . (Each of 


the two integrals represents the expected value of an exponential random variable, which is the : 
reciprocal of 2.) Similarly, since the mean-square value of an exponential rv is E(¥") = V(¥) + (E(/)y = 


2 392 = 9/92 UG iyo PIR AS 2p 20S . 
1/A° + LAS = 2/2", E(X) = [, x f(x)ax =... = 2p, Ae) ) prom this, 
yxy - 22.,20'-p) |p (-p)y 

races ie FR 


For x > 0, F(x; Ay, Aa, p) =f £054.42 p)dy = J, [pae’ +(1-p)Ae” |e = 
pi re “dy +(1 ~p)f re +¥dy = p(l—-e*") + (1— p)(l-e*"). Forx < 0, F(x) = 0. 


P(X> 01) =1 — F(.01) = 1-[.5(1 — 4) + (1 — 51 — 7») = Se + Se? = .403. 


Using the expressions in a, z = .015 and a = 000425 = o = .0206. Hence, 
Plu-0 <X < yt + 0) = P(-.0056 < X < 0356) = P(X < 0356) because X can’t be negative 
= F(.0356) = ... = .879. 


F ‘ / 
For an exponential rv, CV = ce UE 


yp Wa 
_VEQ?)-# _ (EQ?) | 
7 uw 


2( pa? +(1— p)A? 24-p)i? 

aiph += Poa) = /2r—1, where r= Fas ak Eat 3 But straightforward algebra shows that 
(pa, +(I- p)A,) (pd, +(1-p)A) 
r>1 when J, # A2, so that CV > 1. 


1. For X hyperexponential, 


2p/A?+2l-p)/A_ 
[p/a+0-pylal 


CV= l= 


£ 
m7 


Vn ] 


For the Erlang distribution, uae and g =~, so CV= —=<] forn>1. 


i mr, 
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115. 
has a normal distribution, by definition . has a lognormal distribution. 


a. Since nf 
I ‘ 
! I I at 
Bs  P(I, >2i,)= oft > 2} = fin > na} l -o( n( 4) na} 1-P(X <In2)= 
1 -o( B21). 1-@(-6.14)=1. 
05 
c. e( 4 = g!* 0025 i 2.72 and r( 4) eras ‘(e batt \)= 0185. 
I, rf 
2 Pla : 
117. F(y) = P(Y sy) =P(oZ+ us y)= (zs Y A\-f “ee e** dz => by the fundamental theorem of 
ao ) t= Jan 
_ owen 1 AS) 11 A) 
calculus, fiy)= F'(y)= ——e ‘°/ -.—= ‘anormal pdf with parameters yw and o. 
V2z o \2n0 
119. 
a. Y=—In(X) => x=e"=k(y), so ky) =-e”. Thus since f(x) = 1, g(v)=1-|-e* |=e* for0<y<a 
Y has an exponential distribution with parameter A = 1. 
b. y=oZt+y>z=ky)= 2 oF and K'(y)= Se from which the result follows easily. 
o o 
ce ypHh(x)=cex>x=ky) = * and K(y) =!) from which the result follows easily. 
¢ c 
121. 
a. Assuming the three birthdays are independent and that all 365 days of the calendar year are equally 
3 
likely, P(all 3 births occur on March 11) -(5 
1s Pe ek 
b. P(all 3 births on the same day) = P(all 3 on Jan. 1) + P(all 3 on Jan. 2) +... = “al (55 at 


36 (55 = (tt 
365 365 
Let X = deviation from due date, so X ~ N(0, 19.88). The baby due on March 15 was 4 days early, and 


22) o( 2) 


JG) 


P(X = -4) = P(-4.5 < X<-3.5)= © 
; oa & 


(—.18) — @(—.237) = .4286 — 4090 = .0196. 
Similarly, the baby due on April 1 was 21 days early, and P(X =—21) ~ 


}=©(-1.03)-0(-1.08) =.1515-.1401 =.0114. 


19.88 


(ses) a 
19.88 19.88 
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Finally, the baby due on April 4 was 24 days early, and P(X = -24) = .0097. 


Again assuming independence, P(all 3 births occurred on March 11) = (.0196)(.01 14)(.0097) = 
.0002145. 


d. To calculate the probability of the three births happening on any day, we could make similar 
calculations as in part ¢ for each possible day, and then add the probabilities. 


123. 
a. F(x)=P(X<x)= p{—in(i-v)<x]- P(In(1-U) > -Ax) = P(I-U 2e™) 


= P(U <1 -e*)= |—e~" since the cdf of a uniform rv on (0, 1] is simply F(w) =u. Thus X has an 


exponential distribution with parameter A. 


b. By taking successive random numbers 4), t2, 43, --- and computing x, = -—in(1 —u,) for each one, we 


obtain a sequence of values generated from an exponential distribution with parameter A= 10. 


125. If g(x) is convex in a neighborhood of 1, then g(42) + g’(M)(x — ji) < g(x). Replace x by X: 
Fle(u) + '(u(X— 4) < Ele(X)] => Elg@l= s() + gE — = 8) + g'(u) * 0 = g(u). 
That is, if g(x) is convex, g(E(X)) < E[g(X)). 


127. 
5) — 2 
a. E(X)=150 + (850- 150) =710 and V(X) = Gn= = 7127.27 = SD(X) = 84.423. 
8+2 (8+ 2)° (8+2+1) 
Using software, P(|X — 710| < 84.423) = P(625.577 < X < 794.423) = 
cc 1 (0) (25182) (See 


«25577 700 1(8)0(2)\ 700 700 


1 
dx = .684. 


% Pax> 750) = [—t r(10) (s80) (So 
foe 70 700 F(8)F(2)\, 700 700 


requested integral requires a calculator or computer. 


1 
dx= 376. Again, the computation of the 
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CHAPTER 5 


Section 5.1 


a. P(X=1, Y=1)=p(1,1) =.20. 
b. P(X <1 and ¥< 1) =p(0,0) + p(0,1) + p(1,0) + p(1,1) = .42. 
c. Atleast one hose is in use at both islands. P(X # 0 and Y # 0) = p(1,1) + p(1,2) + p(2,1) + p(2,2) = .70. 


d. By summing row probabilities, p(x) = .16, .34, .50 for x = 0, 1, 2, By summing column probabilities, 
py) = .24, 38, .38 for y=0, 1,2. P(X s 1) = px(0) + px(1) = .50. 


e. p(0,0) =.10, but p(0) - p{0) = (.16)(.24) = .0384 # .10, so X and ¥ are not independent. 


a. (1,1) =.15, the entry in the 1“ row and 1“ column of the joint probability table. 
b. P(X; = Xz) = p(0,0) + p(1,1) + p(2,2) + p(3,3) = .08 + .15 + .10 + .07 = .40. 


ce A={X,22+X,UX,>2+X}, so P(A) = p(2,0) + p(3,0) + p(4,0) + p(3,1) + p(4,1) + p(4,2) + p(0,2) 
+ p(0,3) + p(1,3) =.22. 


d. P(X; +X. = 4) = p(1,3) + p(2,2) + p(3,1) + p(4,0) = .17. 
P(X, + X22 4) = P(X, + Xp = 4) + p(4,1) + p(4,2) + p(4,3) + p(3,2) + p(3,3) + p(2,3)=.46. 

a. p(3,3)=P(X=3, Y=3) =P(3 customers, each with | package) 
= P( each has | package | 3 customers) - P(3 customers) = (.6)° « (.25) = .054. 

b. p(4, 11)=P(X=4, Y= 11) =P(total of 11 packages | 4 customers) - P(4_ customers). 
Given that there are 4 customers, there are four different ways to have a total of 11 packages: 3, 3, 3, 2 
or 3, 3, 2,3 or 3, 2, 3, 3 or 2, 3, 3, 3. Each way has probability (.1)*(.3), so pP(4, 11) = 4(.1)°(.3).15) = 
00018. 

a. p(1,1) =.030. 

b. P(X <1 and ¥<1)=p(0,0) + p(0,1) + p(1,0) + p(1,1) = .120. 

ce. P(X=1)=p(1,0)+pC,1) + pC,2) = .100; P(Y = 1) = p(0,1) + ... + p(5,1) = .300. 


d. P(overflow) = P(X + 3Y > 5)=1—P(X+3Y < 5) = 1 —P((X,¥Y)=(0,0) or ...or (5,0) or (0,1) or (1,1) or 
(2,1)) = 1 — .620 = .380. 
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The marginal probabilities for X (row sums from the joint probability table) are p(0) = .05, px{1) = 


10, p(2) = .25, px3) = 30, px(4) = .20, px(5) = .10; those for ¥ (column sums) are p(0) = .5, p{1) = 


3, pX2) =.2. It is now easily verified that for every (x,y), POY) = px{x) - p(y), so X and Y are 
independent. 


laff secon ff? yay = Paco Kf a 


3 


2 10K [. x°dx+ 10K ydy oie | Se | ne oe 
20 20 3 380,000 


26 p26 | y » % 
P(X < 26 and Y < 26) = [i [, K(x’ + y’ )dxdy = K{., [eyez dx = KI, (6x? +3192)dx = 
20 
K(38,304) = 3024. 


The region of integration is labeled /// below. 


y=x42 y=x-2 


20 30 


P(\X-Y|<2)= {J f(x, y)dxdy =1 - ff fe. y)dxdy - j } f(x, y)dxdy = 


ut i Mu 


28 p30 30 px-2 
1-f ff. yaya ff £@.y)dyde = 3593(after much algebra). 


3 30 
f= | _S (xy) dy = ‘a K(x? + y*)dy = 10Kx? + K2-| =10Kx + .05, for 20 < x < 30. 
20 


fd) can be obtained by substituting y for x in (d); clearly flxy) # fx) - fy), So X and Y are not 


independent. 


—jh gt ~fly =f My py 
Since X and Y are independent, p(xy)= py (x) p(y) = di. el - -—S 


x! y! xty! 
forx=0, 1, 2, ...; y=, 1, 2, ..-. 
P(X + ¥<1)=p(0,0) + p(0,1) + pU0)=... =e” “[l+4,+u,]- 
e eb E Cae m! k m-k 
PiX+ Yem)= VP(X =k Y em-b/ ae?) oS = =) pte 
D2 Leki (m-k)! om! imp 


e M-Py m 


k m! 


m p MPa 
3 Je tlie : (44, + ,)" by the binomial theorem. We recognize this as the pmf of a 


m! tm 
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Poisson random variable with parameter yz, + 44, . Therefore, the total number of errors, X + Y, also has 


a Poisson distribution, with parameter 4, + 44, . 


13. 
ee Seo yey 


0 otherwise 


a. fixy)=fkx) - fy) = { 


b. By independence, P(X < 1 and ¥< 1)=P(X< 1)- P(Y< 1) =(1—e") (1 —e") = 400. 


c. PUX+¥S2)= [fe dyde = ['e*[1-eO Jade = [Ce ede = 1-2? - 20 =.594. 


d. P(Y+Y<1)= [et [1 ]dx=1-26" = .264, 
so P(1 <X+ Y¥<2)=P(X+ ¥S2)-P(X+ YS 1) = .594— 264 = 330. 


15. 
a. Each_X; has cdf F(x) = P(X; <x) = 1-e™ . Using this, the cdf of Y is 
F(y) = PY sy)=P(Q) <y VLG <y NX < y)) 
= P(X; Sy) + P(X) Sy AX sy)—- P(X WO synX<y)) 
= (l-e”’)+(l-e”’y --e”  fory>0. 


The pdf of Y is fly) = F'Q”) = Ae +2(01-e"”)(Ae”) - 3 -e*” (Ae*” ) = 40” - 340” 


for y > 0. 


“ aay -say Be Lee 
b. E(Y)= | y-(44e7” -3ae™ joy=2{) 4-2 


17. 
a. Let A denote the disk of radius R/2. Then P((X, Y) lies in A) = [Jf »)dedy 
A 


2 
= [[-axay = ff dxdy = eee = pn = d =.25. Notice that, since the joint pdf of X 
ee mR; aR mR 4 
and Y is a constant (i.e., (X,Y) is uniform over the disk), it will be the case for any subset A that P((X, Y) 
es area of A 
lies in A) = ————.. 
aR 
b. By the same ratio-of-areas idea, (-2 KS = <Y¥< ) = x z= ty This region is the square 
2 5 he 2} RR 
depicted in the graph below. 
81 
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RR oe +)- 2R’ 
aR? 


c. Similarly, »(-+ Bikso Rs z 


depicted in the graph below, whose corners actually touch the circle. 


2 ; CMs : 
——_ ==. This region is the slightly larger square 
n 


a VR-2 
d. f,(x)= [Cf y)ay = Leste” =——.— for -R<xsR. 


[R? =. es nh : 
Similarly, fv) = a for-R<y<R. Xand Y are not independent, since the joint pdf is not 
ZR 


2 +4 2 R? _y* 
the product of the marginal pdfs: ae # A Wa ns a 
aR mR rR 


19. Throughout these solutions, K = , as calculated in Exercise 9. 


3 
380,000 


SQy)_ K(x’ +y’) 


; for 20 < y < 30. 
Fy (x) ~ 10Kx? +.05 


$.) in |= 


f(xy) K(’+y’) 
oe << =———___-—~ for 20 < x < 30. 
Sev ly) t,(9) 10Ky? +.05 ath ‘ 


ven gee _ pe K(Q@2y+y) 
b. P(¥225|X=22) [fine 012204 FE ease 5559. 


P(Y225)= f° fyy)dy = [[" @oKy? +.05)dy =.75. So, given that the right tire pressure is 22 psi, it’s 


much less likely that the left tire pressure is at least 25 psi. 


K((22) +y ihe 


= 25.373 psi. 
10K (22)° + 05” 


i 30 
e. E(Y|X=22)= é ¥ Sy |22)dy = Jon si 


BUF [X= 22)= fy? HCV) ay 

20° 10k(22)’ +.05 
WY |X = 22) = BCP | X= 22) — [B( ¥| X= 22)}? = 652.03 - (25.373) = 8.24 > 
SD(Y | X = 22) = 2.87 psi. 


= 652.03 > 
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21. 
TE sta te 
Sens (x, x, 


Fies.x; (x, »%, ) = is T(x, »X, +X ) dx, . 


a fey x, (% |4%)= where fy (2,,x,)= the marginal joint pdf of X; and X}, i.e. 


b.. fz, xx, (41%) = ay , where f(x) =f [ f (%.%;,x, )dx,dx, , the marginal pdf of X}. 
Fy, &) 


Section 5.2 


23. EX - = FFs )- p(%,x,) = (0 - 0)(.08) + (0 — 1)(.07) + ... + (4 — 3)(.06) = .15. 


120 x= 
Note: It can be shown that E(X, — X>) always equals E(X,) — E(X3), so in this case we could also work out 
the means of X, and X; from their marginal distributions: E(X,) = 1.70 and E(X2) = 1.55, so E(X, — X2) = 
E(X) — E(X)) = 1.70 — 1.55 =.15. 


25. The expected value of X, being uniform on [L — A, L + A], is simply the midpoint of the interval, L. Since Y 
has the same distribution, E(Y) = L as well. Finally, since X and Y are independent, 
E(area) = E(XY) = E(X)- E(Y)=L-L=L’. 

27. The amount of time Annie waits for Alvie, if Annie arrives first, is Y—X; similarly, the time Alvie waits for 
Annie is X ~ ¥. Either way, the amount of time the first person waits for the second person is 


h(X, Y) = |\X— ¥|. Since X and Y are independent, their joint pdf is given by f(x) - fy) = (3x°)\(2y) = 6x’). 
From these, the expected waiting time is 


ETAXY)) = ff [x -y|- Fx, ydedy = [['|x-y]-6x7 ped 


© j, J (x ~ y)-6x'ydyde + i, KG — y)-6x" ydydx = a “5 = : hour, or 15 minutes. 


2 2 
29. Cov(X,Y) = -— and wy, =p, ==. 
ov(X,¥) = —S= and My = My == 


a fy a) er ee ee wt i(2 
E(X*) = fx? fede = 12) 0 Hd) ===, 80 WX) = (2 


5 
= a 50. 2 
Similarly, V(Y) =—, so et at 
ilarly, V(Y) 75 Pry rn 75 3 
31. 
ap _ . 1925 _ i 
a. E(X)= [i afe(x)dx= ff x{10Kx +.05 ]dx => = 25.329= E(Y), 


= 641.447 => 


24375 
BUCY) = ff ay K(x + y? ddy = 


Cov(X, Y) = 641.447 — (25.329) =-.1082. 


10) = 649.8246 = E(Y’) => 


b. EW?) = [x *[10Kx? +. 05 |dr=— 


V(X) = V(Y) = 649.8246 — (25.329) = 8.2664 => p= ti =-.0131. 
V(8.2664)(8.2664) 
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33. Since E(XY) = E(X) - E(Y), Cov(X, Y) = E(XY) - EX) - EY) = E(X) - E(Y) - E(X) - E(Y) = 0, and since 
Corr(X, Y)= CovX,¥) , then Corr(X, Y) = 0. 


x Oy 


35. 
a. Cov(aX+b, c¥ +d)= E[(aX + by(c¥ + d)]— E(aX + b)- E(cY + d) 
= EfacXY + adX + bc¥ + bd] — (aE(X) + b\(cE( Y) +d) 
= acE(XY) + adE(X) + bcE(Y) + bd — [acE(X)E(Y) + adE(X) + bcE(Y¥) + bd] 
= acE(XY)—- acE(X)E(Y) = ac[E(XY) — E(X)E(Y)] = acCov(X, Y). 


Com(aX +b, c¥ + d)= = Caves tel te) ___acCo(XY) Gore x,¥). When a 
SD(aX +b)SD(cY + d) |a|-|c|SD(X)SD(Y) |ac| 

and c have the same signs, ac = |ac|, and we have 

Corr(aX + b, c¥ + d) = Corr(X, Y) 


c. Whena and c differ in sign, lac/ =—ac, and we have Corr(aX + b, cY + d) = —Corr(X, Y). 


Section 5.3 


37. The joint pmf of X, and X is presented below. Each joint probability is calculated using the independence 
of X; and X}; e.g., p(25, 25) = P(X = 25) + P(X_ = 25) = (.2)(.2) = .04. 
x) 
Z 
X2 Bo’ 
fy) 


a. For each coordinate in the table above, calculate x . The six possible resulting ¥ values and their 
corresponding probabilities appear in the accompanying pmf table. 


40 45 52.5 65 
.30 .09 


From the table, E(X ) =(25)(.04) + 32.5(.20) +... + 65(.09) = 44.5. From the original pmf, x = 25(.2) + 
40(.5) + 65(.3) = 44.5. So, E(X) =u. 


b. For each coordinate in the joint pmf table above, calculate g = he ~ x) . The four possible 
—~S fal 


resulting s” values and their corresponding probabilities appear in the accompanying pmf table. 


s 0 112.5 312.5 800 
p(s’) 38 20 30 12 


From the table, E(S’) = 0(.38) + ... + 800(.12) = 212.25. From the original pmf, 
o = (25—44.5)(.2) + (40 - 44,5)°(.5) + (65 — 44.5)°(.3) = 212.25. So, E(S’) = o. 
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39, X is a binomial random variable with n = 15 and p = .8. The values of X, then X/n = X/15 along with the 
corresponding probabilities b(x; 15, .8) are displayed in the accompanying pmf table. 


10 
467 533 667 
003.014 .043_— 103 


.267 
.000 


a5 
000 


067 
.000 


2 
.000 


133 
.000 


p(x/15) | .000 001 
x ii fisias-- Me is 
x15 | .733. 8 867 933 1 


p(x/15) | .188 = .250 §=.231_ =.132 ~—-.035 


41. The tables below delineate all 16 possible (x), x2) pairs, their probabilities, the value of x for that pair, and 
the value of r for that pair. Probabilities are calculated using the independence of X; and X). 


(x), X2) 1,1 12 813 “14: Zi 2:2. 233 24 
probability .16 .12 08 04 .12 09 06 03 
4 I 1:5 2 2.5 1.5 2 2:5 3 
r 0 1 2 3 1 0 1 2 


(x1, X2) 3,1 a2 ao S54 41 42 43 44 
probability .08 .06 04 02 04 03 02 1 
5 2 25 3 3 22 3 3.5 4 
r 2 1 0 I 3 2 l 2 


a. Collecting the ¥ values from the table above yields the pmf table below. 


b. P(X <2.5)=.16 + .24+ .25+ .20 =.85. 


e. Collecting the r values from the table above yields the pmf table below. 


d. With» =4, there are numerous ways to get a sample average of at most 1.5, since X < 1.5 iff the sum 
of the X; is at most 6. Listing out all options, P( X < 1.5) = P(i,1,1,1) + P(2,1,1,1) + ... + P(1,1,1,2) + 
P(1,1,2,2) + ... + P(2,2,1,1) + P(3,1,1,1) +... + PC,1,1,3) 
= (.4)* + 4(.4)'(.3) + 6(.4)°(.3)? + 4(.4)°(.2)° = .2400. 


43. The statistic of interest is the fourth spread, or the difference between the medians of the upper and lower 
halves of the data. The population distribution is uniform with A = 8 and B= 10. Use a computer to 
generate samples of sizes n = 5, 10, 20, and 30 from a uniform distribution with A = 8 and B= 10. Keep 
the number of replications the same (say 500, for example). For each replication, compute the upper and 
lower fourth, then compute the difference. Plot the sampling distributions on separate histograms for n = 5, 
10, 20, and 30. 
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45. Using Minitab to generate the necessary sampling distribution, we can see that as n increases, the 
distribution slowly moves toward normality. However, even the sampling distribution for n = 50 is not yet 


approximately normal. 


n=10 
Normal Probability Plot 
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Section 5.4 


47. 
a. Inthe previous exercise, we found E(X) = 70 and SD(X)=0.4 when n = 16. If the diameter 
distribution is normal, then X is also normal, so 
P(69<¥<71)= (oS <Zzs a \- P(-2.5 < Z<2.5) = @(2.5) — (-2.5) = .9938 — .0062 = 
.9876. 
eyen! 7 ee eas 2) pty RO oe = it, 71-70 
b. Withn=25, E(X)=70 but SD(X)=—= = (),32 GPa. So, P(X >71)= P| Z> 
V25 0.32 
| — 0(3.125) = 1 —.9991 = .0009. 
49. 


a. 11 P.M.—6:50 P.M. = 250 minutes. With 7, =X; + ... + X4o = total grading time, 
si, =n =(40)(6)= 240 and o;, =o -Vn =37.95, so P(T, $250) = 
iz < 250 — 240 


| P(Z <.26) =.6026. 
37.95 
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b. The sports report begins 260 minutes after he begins grading papers. 


- > 20-20) - p(z>.53)=.2981. 


P(T, > 260) =P 
oy 37.95 


51. Individual times are given by X ~ M(10, 2). For day 1, m = 5, and so 
11-10 
2/5 


For day 2, = 6, and so 


PR sil)=7(Z5 )-Poz s1.t2)= 86%. 


- Ms 11-10 
PX <p ak sin=P( zs }=Pz s1.22)= 8888 . 
2/6 


Finally, assuming the results of the two days are independent (which seems reasonable), the probability the 
sample average is at most 11 min on both days is (.8686)(.8888) = .7720. 


53. 
a. With the values provided, 
P(X >51)= pf z2 21-2) - riz 225)=1- 9938 = 06 . 
1.2/ V9 
b. Replace n= 9 by n = 40, and 
P(X 2s =7(z2 Si P(Z >5.27)=0. 
“1.21440 
55. 


a. With Y=# of tickets, Y has approximately a normal distribution with 1 = 50 and o = Ju = 7.071. So, 
using a continuity correction from [35, 70] to [34.5, 70.5], 


P(35 < Y<70) (Sm z< a) = P(-2.19 <Z< 2.90) =.9838. 


7.071 7.071 


b. Now» =5(50) = 250, so o =¥250 =15.811. 
Using a continuity correction from [225, 275] to [224.5, 275.5], P(225 < Ys 275) * 
bicelles. aad = P(-1.61 < Z< 1.61) = .8926. 
15.811 15.811 


us e*"250" 


y=225 y! 


» € 50 
yess y ! 


= .9862 and part (b)= }° = 8934. Both of the 


c. Using software, part (a) = ¥. 


approximations in (a) and (b) are correct to 2 decimal places. 


57, With the parameters provided, E(X) = af = 100 and V(X) = af = 200. Using a normal approximation, 


125-100 
P(X < 125) = of 2s 251 | pes 1.77) = .9616. 
200 
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Section 5.5 


59. 


61. 


63. 


18} | oF eT RR ee _ > 


EX, + Xa +X) = 180, V(X, +42 +X) = 45, SDM + X2 + 3) = J45 = 6.708. 


200180) = P(Z < 2.98) =.9986. 
6.708 


P(150 <.X; + X2 + X3 < 200) = P(-4.47 < Z < 2.98) = .9986. 


P(X, + X, + X3 $200) = zs 


_=y=60and o, =—~= = 2.236, so 
Hy =# R ape: 


55-60 
ete 
P(58< X < 62) = P(-.89< Z <.89) =.6266. 


|- P(Z > 2.236) =.9875 and 


E(X, - 5X2 — 5X3) =4- 5S u- Sue 0, while 
VX, - 5X2 —.5X3) = of +.250? +.250; =22.5=> SD(X, — 5X; -.5X3) = 4.7434. Thus, 
“10-0 <Z Pov hac =P(-2.11<Zs 1.05) = .8531 -.0174= 


10 <.X, — 5X) —.5X;$ 5)=P| ——-<Z< 
si oda : area Ger 4.7434 


.8357. 


E(X, + X,+ X3)= 150, ViX,+X%+ X3) = 36 => SD(X, + X, + X3) = 6, SO 


aod 
6 


P(X, + Xz + X3 $ 200) = Pz = P(Z $1.67) =.9525. 


Next, we want P(X; + X2 2 2X3), or, written another way, P(X; + X2 — 2432 Q). 

E(X, +.Xq — 2X3) = 40 + 50 - 2(60) =-30 and V(X; + X2- 2X3) = oa + a3 +40; =78=> 

SD(X, + X> — 2X3) = 8.832, so 

0—(-30) 
8.832 


P(X, + X2 — 2X32 0) = {zz )= Pz 23.40) =,0003. 


The marginal pmfs of X and Y are given in the solution to Exercise 7, from which E(X) = 2.8, 
E(¥) =.7, VX) = 1.66, and V(Y) =.61. Thus, E(X + ¥) = E(X) + E(Y) = 3.5, 
V(X + Y) = V(X) + V(X) = 2.27, and the standard deviation of X+ Y is 1.51. 


E(3X + 10Y) = 3E(X) + 10E(¥) = 15.4, Vi3X + 10Y) = 9V(X) + 100K Y) = 75.94, and the standard 
deviation of revenue is 8.71. 


E(X,) = 1.70, E(%Q) = 1.55, E(X)XD) = > > Ps») =" = 3,33, s0 


% 


Cov(X;, X2) = E(X,X2) — E(X,) E(Xp) = 3.33 - 2.635 = .695. 


V(X, + Xz) = V(X) + V(X2) + 2Cov(X), Xz) = 1.59 + 1.0875 + 2(.695) = 4.0675. This is much larger 
than V(X,) + V(X), since the two variables are positively correlated. 
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65. 
a. E(X-Y)=0;V(¥-Y) aera =.0032 => o, , = V.0032 =.0566 
=> P(-.1< ¥-¥ <.1)= P(-1.77< Z<1.77)=.9232. 
b. V(X-7)=2- +2 = .0022222=0¢ 7 =.0471 
36 36 : 
> P(- 1s X-Y¥<. \)= P(—2.12 < Z < 2.12) =.9660. The normal curve calculations are still justified 
here, even though the populations are not normal, by the Central Limit Theorem (36 is a sufficiently 
“large” sample size). 
67. Letting X), X;, and X; denote the lengths of the three pieces, the total length is 
X, +X,—X3. This has a normal distribution with mean value 20 + 15 — 1 = 34 and variance .25 + .16 + .01 
= .42 from which the standard deviation is .6481. Standardizing gives 
P(34.5 < X, +X -X3 5 35) = P(.77 < Z< 1.54) = 1588. 
69. 
a. E(X, + X,+X3)=800 + 1000 + 600 = 2400. 
b. Assuming independence of X), X2 , X3, V(X) + X2 + X3) = (16) + (25) + (18)? = 1205. 
ce. E(X, +X, +X3)= 2400 as before, but now V(X, + X2 +X3) 
= V(X,) + V(X) + V(X3) + 2Cov(X, Xz) + 2WCov(X%, X3) + 2Cov(%, X3) = 1745, from which the 
standard deviation is 41.77. 
71. 
a. M=a,X,+4,X,+W] xdx=a,X,+a,X,+72W, so 
E(M) = (5)(2) + (10)(4) + (72)(1.5) = 158 and 
o?, =(5) (5) +(10)' (1) +(72)' (.25) = 430.25 => a, = 20.74. 
b. P(M <200)= fz < | = P(Z < 2.03) =.9788. 
20.74 
73. 


a. Both are approximately normal by the Central Limit Theorem. 


b. The difference of two rvs is just an example of a linear combination, and a linear combination of 
normal rvs has a normal distribution, so ¥ —Y has approximately a normal distribution with Ay _; =5 


=I=5 1=5 


= P(-3.10< Z<-2.47) = .0068. 
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—§ 
, 10-5 }- P(Z > 3.08) =.0010. This probability is quite small, so such an 


~ 1.6213 


occurrence is unlikely if 4, — 4, =5, and we would thus doubt this claim. 


d. P(X-Y2>10)~ p(z 


Supplementary Exercises 


75. 
a. p(x) is obtained by adding joint probabilities across the row labeled x, resulting in p(x) =.2, .5, 3 for 
x = 12, 15,20 respectively. Similarly, from column sums py(y) = .1, .35, 55 for y= 12, 15, 20 
respectively. 
b. P(X< 15 and ¥< 15) =p(12,12) + p(12,15) + pU15,12) + p(15,15) = .25. 
ce. px12)+ p12) = (.2)(.1) # .05 = p(12,12), so X and Y are not independent. (Almost any other (x, y) pair 
yields the same conclusion). 
d. E(X+Y)=>_ >i (x+y) (x,y) =33.35 (or = E(X) + EY) = 33.35). 
e. E(X-Y)=>)>|x-y|pG@.y) = =3.85. 
77. 
_f7 fpr 30 p30-x __ 81,250 2: 3 
ad 1=[" [sf y)axds =|, a "kasydyds + f j, kxydydx aan Bt a6. 


a kxydy = k(250x - 10x’) 0<x< 20 
BF) 
f, kuydy = k(450x-30x7+4x°) 20<x<30 


By symmetry, /;()) is obtained by substituting y for x in f(x). 


Since fy(25) > 0 and fy(25) > 0, but (25, 25) = 0 , fix) - fr) # fy) for all (x, y), so X and ¥ are not 
independent. 


Be dl (easing We 2S p25-x ay 3 230,625 
c. P(X+Ys 25) = {, Ec ‘ kxydydx u { {, keydyd: = 21250 24 =.355: 


d. E(X+Y)= E(X) + E(Y) =2E(X)=2{[" x-(250x—10x*) dr 


+ f°x-k(450x—30x? + x°)ds} = 24(351,666.67) = 25.969. 
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ee E(XY) = gens xy: f (x, y)dxdy = [fp ee yavde 


30 p30-x, 4 4 _k 33,250,000 
+f I, kx y'dydy === = 136.4103, $0 
Cov(X, Y) = 136.4103 — (12.9845)° = -32.19. 
E(X*) = E(Y*) = 204.6154, so o% =07 = 204.6154 —(12.9845)° = 36.0182 and pra a- 


™ 


ViX + Y)= V(X) + V(Y) + 2Cov(X, Y) = 7.66. 


79. E(X +¥ +Z)=500+900 + 2000 = 3400. 


VR 47 4 Z) a Oe, 180 @193. 014 SDL +7 +Z)=11.09. 
365 365 365 


P(X +¥ +Z <3500)=P(Z <9.0) ~1. 
81. 
a. E(N)-w=(10)(40) = 400 minutes. 


b. We expect 20 components to come in for repair during a 4 hour period, 
so E(N) - p = (20)(3.5) = 70. 


83. 0.95 = P(u—-.02< X¥ <+.02)= of an Z< ae P(-2Vn <Z <.2Vn) ; since 
/Vn 


A/vn 
P(-1.96 < Z <1.96)=.95, 2Vn =1.96 => n=97. The Central Limit Theorem justifies our use of the 


normal distribution here. 


85, The expected value and standard deviation of volume are 87,850 and 4370.37, respectively, so 


ae = P(Z <2.78) =.9973. 


P(volume < 100,000) = (z < 
4370.37 


87. 


a. P(I2<X<15)= (2? <Z <) = P(-0.25 <Z<0.5) = .6915 — 4013 = .2092. 


b. Since individual times are normally distributed, XY is also normal, with the same mean 4 = 13 but with 
standard deviation o, =o / Vn = 4/16 = 1. Thus, 


pur<X<is)=P( BoB <z <b) = P(-1 <Z<2)=.9772—.1587 = 8185. 


ce. The mean is « = 13. A sample mean _X based on n = 16 observation is likely to be closer to the true 
mean than is a single observation X. That’s because both are “centered” at «, but the decreased 


variability in X gives it less ability to vary significantly from yu. 


d. P(X >20)=1-0(7)=1-1=0. 
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V(aX +¥) = a?o? +2aCov(X,Y) +9; =a'o? + 2a0,0)p+0;- 
Substituting @ = 9 yields o? +203p +o; =20;(I + p)20. This implies (1 + p)2 0, orp 2 -I. 
o 


x 


The same argument as in a yields 207 (1—p) 20, from which p <1. 


Suppose p= 1. ThenV (aX -Y)= 20? (1-9) =0, which implies that aX — Yis a constant. Solve for 
and Y = aX — (constant), which is of the form aX + b. 


1 1 MO My By t% 


With ¥=X%+ 2, F, (y)= = (4k FTV, PTH /2) xf ge? ish But the 


l ((y+¥2)/2) Pes 2 


2 SOE, a PRA PET , from which the result 
i)? T((v, +v,)/2) 


inner integral can be shown to be equal to 
follows. 

By a, Z7 + Z; is chi-squared with v = 2, so (Z; +Z;)+Z; is chi-squared with v = 3, etc., until 
Z? +..+Z; is chi-squared with v =n. 


> 


— A] is chi-squared with v = 1, so the sum is chi-squared with 


X,-pM. 
ae is standard normal, so | 
o 


parameter v = 7. 


V(X) =V(W +E) = 07 +0; =VW +E) =V(X,) and Cov(X,,X,) = 
Cov(W + E,,W +E, )=Cov(W,W)+Cov(W,E, )+ Cov(E,,W)+Cov(£,, £, )= 
Cov(W, W)+0+0+0= V(W) = Cn. 


” Jo +02 - Jo, +0; Oy + 8% 
pe 
1+.0001 


E(Y) = hy, flys Hy dg) = 120[ h +4 +4] = 26. 


10 


, Cane ; x x, x 
The partial derivatives of A( 4, My, Ms, /4) with respect to x), x2, x3, and x4 are ae —" -—t, and 

x 

1 2 3 


l 


Ag dae respectively. Substituting x; = 10, x2 = 15, x; = 20, and x4 = 120 gives —1.2, —.5333, 3000, 


5 ae. 


and .2167, respectively, so V(Y) = (yl. 2y + (1\(- 5333) + (1.5)(-.3000)° + (4.0)(.2167)° = 2.6783, and 
the approximate sd of ¥ is 1.64. 
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97. Since X and Y are standard normal, each has mean 0 and variance |. 
a. Cov(X, U)=Cov(X, .6X + .8Y) = .6Cov(X, X) + .8Cov(X, Y) = .6V(X) + .8(0) = .6(1) = 6. 
The covariance of X and Y is zero because X and ¥ are independent. 
Also, V(U) = V(.6X + .8Y) = (.6)° V(X) + (89 VY) = (.36)(1) + (.64)(1) = 1. Therefore, 
Cov(X,U) 6 


oyoy . Vivi 


b. Based on part a, for any specified p we want U = pX + bY, where the coefficient 6 on Y has the feature 


Corr(X, U) = = .6, the coefficient on X. 


that p’ + 6’ = | (so that the variance of U equals 1). One possible option for b is b = y1—- p’ , from 


which U=pX+ Jl—p’ Y. 
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CHAPTER 6 


Section 6.1 


1 
= : . xb SHIGEO 
a. We use the sample mean, ¥ , to estimate the population mean f. =X =——= > = 8.1407 
n Z 
b. We use the sample median, *=7.7 (the middle observation when arranged in ascending order). 
f.. -  [1860.94- 29% 
c. Weuse the sample standard deviation, s = Js? = aS = 1.660. 
d. With “success” = observation greater than 10, x = # of successes = 4, and p= *= 3 =.1481, 
n z 
s 1.660 
e. Weuse the sample (std dev)/(mean), or == = .2039. 
x 8.1407 

3: 

a. Weuse the sample mean, ¥ =1.3481. 

b. Because we assume normality, the mean = median, so we also use the sample mean X =1.3481. We 

could also easily use the sample median. 
c. Weuse the 90" percentile of the sample: / + (1 .28)6 =X + 1.285 =1.3481 +(1.28)(.3385) =1.7814. 
d. Since we can assume normality, 
P(X <1.5)* P(z < jet Plz obs et = P(Z <.45) =.6736. 
gan 3385 
: CG s 3385 
e. The estimated standard error of ¥ =—= =—= =- = 0846 
Vn vn Vi6 

5. Let @ = the total audited value. Three potential estimators of @ are A, = NX, 0, =T -ND , and 6; = r= 


From the data, y = 374.6, X = 340.6, and d = 34.0. Knowing N = 5,000 and 7 = 1,761,300, the three 
corresponding estimates are 8, = (5,000)(340.6) = 1,703,000 , 6, = 1,761,300 —(5,000)(34.0) = 1,591,300 , 


and 6, = 761300 37 = 1,601,438.281 . 
374.6 
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he 
a jaro dei ni natn g 
n 10 
b. 7=10,000 “=1,206,000. 
¢. 8 of 10 houses in the sample used at least 100 therms (the “successes”), so p = # =.80. 
d. The ordered sample values are 89, 99, 103, 109, 118, 122, 125, 138, 147, 156, from which the two 
middle values are 118 and 122, so j= x awe =120.0, 
9. 
a. E(X)==E(X),so X isan unbiased estimator for the Poisson parameter i. Since n = 150, 
= _ Ex, _ (0)018) + (G7) +...+( DD) _ 317 211 
i 150 ae 
b. o,= _ vu so the estimated standard error is f 0 e =.119. 
ae a n 
11. 
1 | X2 l | l I ; 
et een = — E(X,)-—E(X)=—(m p,) - —(2 P2) = Pi P2- 
nm My ny Ns n Ny 
b. r(% -%2)-r(4)+7(%)- (+) rixys{ 2 1) V(X,)= 
nm Nn, n Nn, n 3 
= I _ Pidi . P292 F : : 
—(n, p,q, )+ (nm P2492) = + , and the standard error is the square root of this quantity. 
ny n3 ny nN» 
ce. With p,=—), 9, =1-p), P2=—. =1- p>, the estimated standard error is Pith , Pode 
mn ny M2 
eo 127 176 
d. — p,)=—- -—— =.635 —.880 = -.245 
(,-Pa) 200 200 
(.635)(.365) fe (.880)(.120) _ 41 


200 200 


2 3}! 
13. p= E(X)=fx-4(l+Oxpdr= 4 


pigs 0=31> 
cas 


6=3X => E(6)= EBX) = 3E(X)=3u=3{ 3 }0=6. 
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15. 
f ar2 . . Gy . A >A: 
a. £E(X’)=26 implies that £ = =@. Consider 0 = : . Then 
va an 
‘ x E(X? 20 ‘ 
E(6)=E zt. _DA(X') _ 20 _ 2nd _ implying that @ is an unbiased estimator for @. 
2n 2n 2n 2n 
, : 90. 
b. >)x) =1490.1058, so Re 1h Sh 
20 
17. 
r—-l x+r-1 : 
E(p)= . ‘-(l- 
: (?) | es -{ x ( P) 
2(x+r—2)! GA eRe , : 
= +—____“<. p™ -(1- = |- = nb(x:r—1,p)=p. 
Pee oy 2 (1p) Pe, i (I-p) De p)=P 
: iy Se] 4 
b. For the given sequence, x =5,so0 p= =—=.444, 
5+5-1 9 
19. 


a. A=5p+.15=>2A=p+3,s0 p=2A—.3 and p=24-3=2{7)-3 the estimate is 
n 


[z)-3 =.2. 
80 


b. E(p)=E(2A-.3)=2E(A)-.3=24-.3 = p , as desired. 


10... i 0 A ae 
. Here 2=.7p+(.3)(.3), =—A-— and p=—|—|-—.- 
c ere p+(.3).3), SO p 7“ 70 * P (2) 70 
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Section 6.2 


21. 
a. g(x)=6-1(1+2) and E(X?)=V(X)+[E(X) = p°r(1+2), so the moment estimators @ and 
a a 
“ é 3 | oe 2 * Bs - 
B are the solution to X = £-T| l+— >; = B’T| 1+—]|. Thus 8 =—~——_~ , so once @ 
a) on a ] 
r(i+4] 
a 
has been determined r(i+4] is evaluated and B then computed. Since ¥* =f m{i+3], 
a a 
; xy r{i+Z) 
—)> Ls ee EN , so this equation must be solved to obtain @ . 
Feel ( 1 
rli+— 
a 
2 2 1 
r(1+3] r(i+3} 
b. Fro ne =1.05 <> so aia = .95 EAN , and from the hint, 
20\ 28.0 m(1+2} 1.05 r{1+2) 
a a 
ds pis dicts reg Pees wee 
a r(1.2) (1.2) 
23. Determine the joint pdf (aka the likelihood function), take a logarithm, and then use calculus: 
ee -3/20 -ni2.- 420 
(Xjye00X, |O)= | [ee ? = (200) 
f r5%| IT — 
(8) = In f(%,...%, | OV = ~F In(2n) ->In()—¥° x; /20 
¢(0)=0->+ Dx; /20° =0=>-n0 +x? =0 
Solving for @, the maximum likelihood estimator is = ly x: ‘ 
n 
25. 


a. /1=¥ =384.4;s’ =395.16, so 19\(,, =¥) =f = = (395.16) = 355.64 and & = /355.64 =18.86 
n 
(this is nots). 


b. The 95" percentile is z+1.645o° , so the mle of this is (by the invariance principle) 


fi+1.6456 = 415.42. 


a 


c. The mle of P(X < 400) is, by the invariance principle, of - 13.86 )- (0.83) = 
Go . 


.7967. 
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27. 
rl B) Ce log likelihood is 
a. Xp X,30, B) =-———_*+__——_., 
4-4 fy p-r"(a) 
(a-1)}'In(x ) 2 —na\n(#)—ninl'(a@). Equating both is and 43 to 0 yields 
B da dp 
d > s, na ‘ ‘ ; 
Y In(x, )-nin(f)-n—~T (a) =0 and a 0 , a very difficult system of equations to solve. 
a 
ates > x, we A Ailat tesa 
b. From the second equation in a, B =na=>x=aB=p,s0the mle ofvis “=X. 
29. 


a. The joint pdf (likelihood function) is 
Bg x OO 20 
F(X y04X,34,8) = {* , * *n 


0 otherwise 
Notice that x, 2 ,...,x, 26 iff min(x,)2 6, and that —AZ(x,-0)=—ALx, +40. 
A" exp(—ALx, exp(na@) min(x,)20 
0 min(x,)<@ 
Consider maximization with respect to @ Because the exponent 7/1@ is positive, increasing @ will 
increase the likelihood provided that min(x,) 20; if we make @larger than min(,x,), the likelihood 


Thus likelihood = 


drops to 0, This implies that the mle of 0is 6 = min(x,). The log likelihood is now 


a a n n 
n\n(A)— AZ| x, —@}. Equating the derivative w.r.t. A to 0 and solving yields 4 =—_—— = =. 
( E(x, - 6) Xx, -nO 
b. =min(x,)=.64, and Ex, =55.80,s0 4=—!? __ ~ 292 
55.80-6.4 


Supplementary Exercises 


31. Substitute k = e/cy into Chebyshev’s inequality to write P(|Y — py| >) < 1e/ayy = Vive. Since 
E(X) = wand V(X) = 07 /n, we may then write P(|¥-u\z6)< olin . As n > ©, this fraction converges 
a 


to 0, hence p(|¥ — u|>«) — 0, as desired. 


33: Let x, = the time until the first birth, x. = the elapsed time between the first and second births, and so on. 
Then f(x,,...,x,;4)=Ae™ -(2A)e**"..(nA)e"*™ = n!2"e"™ . Thus the log likelihood is 
In(n!)+nIn(A)—ADkx,. Taking palt and equating to 0 yields A = mien, 
da Dk, 


For the given sample, m = 6, x; = 25.2, x. = 41.7 — 25.2 = 16.5, x; = 9.5, x4 = 4.3, x5 = 4.0, x6 = 2.3; so 
6 

> Ay, = (125.2) + (2)(16.5) +... +(6)(2.3) = 137.7 and j=—0— = 0436. 

kel e 
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35. 
Xt x 23:5 26.3 28.0  28.2'5229:4) 429:5) 306 316 
23.5 23.5 249 25.75 25.85 2645 26.5 27.05 27.55 28.7 36.4 
26.3 26.3 27.15 2725 27:35 27:9 2845 2895 30.1 37:8 
28.0 28:0 28:1. 28:7), 28:75. 293: 298 30:95 38.65 
28.2 28.2 288 28.85 294 29.9 31.05 38.75 
29.4 294 29.45 30.0 305 30.65 39.35 
29.5 29:8 °30.05 3055 31:7 39:4 
30.6 306. 313. 32:23 39:95 
31.6 31.6 32.75 40.45 
33.9 33.9 41.6 
49.3 49.3 
There are 55 averages, so the median is the 28" in order of increasing magnitude. Therefore, {= 29.5. 
r(*) 
37. Let c=——2-... Then E(cS) = cE(S), and c cancels with the two I factors and the square root in E(S), 
ra): Va 
leaving just ¢. When n = 20, c 2TO5) : 1B-DG-5)CEO) _ 8.5)1.5)---(SVz = 1.0132. 
r(10)-/% (lo-D!f% 1/2 
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CHAPTER 7 


Section 7.1 


@ Za? = 2.81 implies that o/2 = 1 — (2.81) = .0025, so a= .005 and the confidence level is 100(1-a)% = 
99.5%. 


b. Zy2 = 1.44 implies that a = 2[1 — (1.44)] = .15, and the confidence level is 100(1—a)% = 85%. 


c. 99.7% confidence implies that a = .003, a/2 = .0015, and 299; = 2.96. (Look for cumulative area equal 
to 1 — 0015 = .9985 in the main body of table A.3.) Or, just use z ~ 3 by the empirical rule. 


d. 75% confidence implies a = .25, a/2 = .125, and zj25 = 1.15. 


a. A 90% confidence interval will be narrower. The z critical value for a 90% confidence level is 1.645, 
smaller than the z of 1.96 for the 95% confidence level, thus producing a narrower interval. 


b. Notacorrect statement. Once and interval has been created from a sample, the mean 1 is either 
enclosed by it, or not. We have 95% confidence in the general procedure, under repeated and 
independent sampling. 


c. Notacorrect statement. The interval is an estimate for the population mean, not a boundary for 
population values. 


d. Notacorrect statement. In theory, if the process were repeated an infinite number of times, 95% of the 
intervals would contain the population mean 1. We expect 95 out of 100 intervals will contain p, but 
we don’t know this to be true. 


(1.96)(.75) 


J20 


a. 485+ =4.85+.33= (4.52, 5.18). 


(2.33)(.75) 


b. Za = 2.01 = 2.33, so the interval is 4.56+ 
16 


= (4.12, 5.00). 


c on =| SHI) = 54.02 755. 
40 


58)(.75) | 
d. Width w= 2(.2) = 4, so n-| MEST) =93.61 7794. 
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Chapter 7: Statistical Intervals Based on a Single Sample 


and we increase the sample size by a factor of 4, the new length is 


o 

If L=2z,,-— 
42 in 
1 


o o 
Dh= 22, =| 2zy—= || = 
4 Jan ‘ + (3 


= >" Thus halving the length requires n to be increased fourfold. If 
—, so the length is decreased by a factor of 5. 


n'=25n, then L’ 
5 
rice = 4.5741, so the 


V20 


| . From Sa, X =4.85, 0=.75, and n= 20; 4.85—1.645 


o 
a. | ¥-1.645— 
( Vn 


interval is (4.5741,00) . 


3 : : 
—— = 59.70, so the interval is 
J25 


Vn 


(-, 59.70). 
Y is a binomial rv with n = 1000 and p =.95, so E(Y) = mp = 950, the expected number of intervals that 


} From 4a, ¥ =58.3, 0=3.0, and n = 25; 58.3+2.33 


Cc. (2.3 +2, 


11. 
capture yu, and o, = Jnpq = 6.892. Using the normal approximation to the binomial distribution, 
P(940 < Y¥ < 960) = P(939.5 < Y < 960.5) = P(-1.52 < Z < 1.52) = 9357 — .0643 = .8714. 


Section 7.2 
= (608.58, 699.74). We are 95% confident that the true average 


164.43 


13. 
fe s 
a XtZo = 654.16 +1.96 
a UB 50 
CO); level in this population of homes with gas cooking appliances is between 608.58ppm and 


188.24, which rounds up to 189 


699.74ppm 
50 = 2(1.96)175) => Vn = 1.96475) za 75) 13.72 => n= (13.72)° 


bh w= 
vn 


15. 
z, =.84, and ®(.84) =.7995 ~.80 , so the confidence level is 80%. 


z, = 2.05 , and b(2.05) = .9798 = .98 , so the confidence level is 98%, 


z, =-67, and (.67) =.7486 ~.75 , so the confidence level is 75%. 
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17. x 5 =135.39 -233-— =135.39—.865 =134.53. We are 99% confident that the true average 


Xx -25,-~ 
"In hs3 


ultimate tensile strength is greater than 134.53. 


19. p= = =.5646 ; We calculate a 95% confidence interval for the proportion of all dies that pass the probe 

96) 5646)(.4354) (1.96) 
soo+ +1.96 ( ot b ‘ I ; 
Peep a se A) BELLS he (.513,.615) . The simpler Cl formula 

(1.96) 1.01079 
356 
(7.11) gives 3646 £1.96 =~ = (.513, .616), which is almost identical. 
250 ~ _ .25+1.645* / 2000 _ 


e408 For a one-sided bound, we need 2, = Zs = 1.645; p =—— =.25; and p == .2507. The 
1000 1+1.645° /1000 


resulting 95% upper confidence bound for p, the true proportion of such consumers who never apply for a 
2 4 2 
rebate, is .2507 + -Otov625X.75)/1000 + C.645)" 141000") _ 507 + 0225 = 2732. 
1+(1.645)° /1000 


Yes, there is compelling evidence the true proportion is less than 1/3 (.3333), since we are 95% confident 
this true proportion is less than .2732. 


23. 
a. With such a large sample size, we can use the “simplified” CI formula (7.11). With p =.25,1=2003. 


and 2,2 = Z.o9s = 2.576, the 99% confidence interval for p is 


ptz,, Pa = .25+2.576 a = .25+.025 = (.225, .275). 
n 2003 


b. Using the “simplified” formula for sample size and p = =.5, 
aH 42° pq _ 4(2.576)’(.5)(.5) 

w (.05)’ 
So, a sample of size at least 2655 is required. (We use p = q =.Shere, rather than the values from the 


sample data, so that our CI has the desired width irrespective of what the true value of p might be. See 
the textbook discussion toward the end of Section 7.2.) 


01 


01 


= 2654.31 


25. 


a n= 
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> 


27. Note that the midpoint of the new interval is we , which is roughly x with a confidence level of 
n+z nt 
(=p) 


ie 
95% and approximating 1.96 = 2. The variance of this quantity is om , or roughly p 4 
n+z°) n+ 


. Now 


x+2 x+2 


fri Ue eee let x" =x+2 and 
n+4 


replacing p with gts , we have (22) Zy 
n+4 n+4 & 


n =n+4, then p= ~~ and the formula reduces to p" + zy ot , the desired conclusion. For further 
n “Von 


discussion, see the Agresti article. 


Section 7.3 


29. 
A boas49 = 2.228 G. tos 59 = 2-678 
De bens 29 = 2.086 @. tors = 2-485 
toos,20 = 2-845 f. -tys, =—-2.571 
31. 
A fos49 = 1.812 a!) 45, 23.747 
tosis = 1.753 Chay ng = bg25,24 = 2.064 
C. toys = 2.602 f. to37 2.429 
33. 


The boxplot indicates a very slight positive skew, with no outliers. The data appears to center near 
438. 


420 430 440 450 460 470 


b. Based ona normal probability plot, it is reasonable to assume the sample observations came from a 
normal distribution. 
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c. With df=n—1 = 16, the critical value for a 95% Cl is f9,.,, = 2.120 , and the interval is 


14 
436292(2.120)( 2") = 438.29 + 7.785 = (430.51,446.08). Since 440 is within the interval, 440 is 


a plausible value for the true mean. 450, however, is not, since it lies outside the interval. 
35. n= 15, X= 25.0, § = 3.55 to¢y4 = 2-145 
a. A95% CI for the mean: 25.0+2. 1452 = (23.06, 26.94). 


Vis 


b. A95% prediction interval: 25.0 + 2.145(3.5), r + = = (17.25, 32.75). The prediction interval is about 


4 times wider than the confidence interval. 


me a. A95%CI: .9255+2.093(.0181) =.9255+.0379 > (.8876,.9634) 
b. A95%P.I.: 9255+ 2.093(.0809) 1+ 4; = 9255 +.1735 = (.7520,1.0990) 
ce. A tolerance interval is requested, with k= 99, confidence level 95%, and n = 20. The tolerance critical 
value, from Table A.6, is 3.615. The interval is .9255+3.615(.0809) => (.6330,1.2180). 
39. 


a. Based on the plot, generated by Minitab, it is plausible that the population distribution is normal. 
Normal Probability Plot 


Probability 
Zegy ee ee 8 


30 50 70 
volume 
Average: 52.2308 Anderson-Darling Normality Test 
StDev: 14.8557 A-Squared: 0.360 
N13 P-Value: 0.392 


b. We require a tolerance interval. From table A.6, with 95% confidence, k= 95, and n=13, the tolerance 
critical value is 3.081. ¥+3.081s =52.231+3.081(14.856) = 52.2314 45.771=> (6.460,98.002). 


c. Aprediction interval, with ¢,,.,. = 2.179: 


52.2314 2.179(14.856) 1+ = 52.2314 33.593 => (18.638,85.824) 
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41. The 20 df row of Table A.5 shows that 1.725 captures upper tail area .05 and 1.325 captures upper tail area 
.10 The confidence level for each interval is 100(central area)”. 
For the first interval, central area = 1 — sum of tail areas = | — (.25 + .05) = .70, and for the second and third 
intervals the central areas are 1 — (.20 + .10) =.70 and 1 —(.15 +.15)=70. Thus each interval has 
687 +1.725 ‘ 
confidence level 70%. The width of the first interval is 1S = 2412 , whereas the widths of 
n n 


the second and third intervals are 2.185 and 2.128 standard errors respectively. The third interval, with 
symmetrically placed critical values, is the shortest, so it should be used. This will always be true forat 


interval. 


Section 7.4 


43. 
a. ease = 18.307 


b. X5s,10 = 3.940 
Since 10.987 = 72 d 36.78 = 720509 P(x 3y5.20 £2? $ Xs.) =-95 
c. ince 1V. 24975,22 an . X 025,22 » 975,22 =X {X0252)=-7>- 


d. Since 14.611= 72.5 and 37.652 = yis25, PO? < 14.611 orf > 37.652) = 
1 — PO? > 14.611) + PQ? > 37.652) = (1 - 95) + .05 = .10. 


45. For the 1 = 8 observations provided, the sample standard deviation is s = 8.2115. A 99% CI for the 
population variance, o”, is given by 
(C1)? / Zoogn-is(2- DS? / 595-1) = (7-8.2115? / 20.2767 -8.2115° /0.989) = (23.28, 477.25) 
Taking square roots, a 99% CI for a is (4.82, 21.85). Validity of this interval requires that coating layer 
thickness be (at least approximately) normally distributed. 


Supplementary Exercises 


47. 
a. n=48, ¥=8.079a, s* = 23.7017, and s = 4,868. 


A 95% Cl for = the true average strength is 


£41.96 =8.079+1.96 888 - 8.079 41.377 = (6.702,9.456). 
vn 48 
ois 
b daa Wins is Satine? og 
2 2 
2708+ aa)! ie 
: (48) = :3108+-1319 _( 166, 410) 
1.96 1.0800 
1+ 
48 
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; ee he R & .2+70, 
The sample mean is the midpoint of the interval: x = an = 65.4 N. The 95% confidence margin of 


error for the mean must have been + 5.2, so t-s/ Vn =5.2. The 95% confidence margin of error for a 
prediction interval (i.e., an individual) is t-sJl+4+ = Vn+l-t-s/ Vn =V11+1(5.2) = 18.0. Thus, the 95% 
PI is 65.4 + 18.0 = (47.4 N, 83.4N). (You could also determine ¢ from n and a, then s separately.) 


_ 352 +1.96? / 2(88) 


a. With p=31/88 =.352, p= = 358, and the Cl is 


1+1.967 /88 
V Q ; 1.96? / 4(88)° 
958 41.96 Meo ONS #196 TABBY" = (.260, .456). We are 95% confident that between 26.0% 


1+1.96° / 88 
and 45.6% of all athletes under these conditions have an exercise-induced laryngeal obstruction. 


b. Using the “simplified” formula, n= 4 Pd = aX = 2401. So, roughly 2400 people should be 
w A 
surveyed to assure a width no more than .04 with 95% confidence. Using Equation (7.12) gives the 
almost identical n = 2398. 


c. No. The upper bound in (a) uses a z-value of 1.96 = Zops. So, if this is used as an upper bound (and 
hence .025 equals a rather than a/2), it gives a (1 — .025) = 97.5% upper bound. If we want a 95% 
confidence upper bound for p, 1.96 should be replaced by the critical value 295 = 1.645. 


With 6=4(X,+%,+X,)-X. of =4V (K+ K+ ¥,)+V (Hs) = Ne :, is 


n, n, My 


For the given data, 6 =-.SQand G; = 1718, so the interval is —.50+1.96(.1718)= (—.84,—.16) ‘ 


2(1.96)(.8) | 
The specified condition is that the interval be length .2, so n= ae = 245.86 7 246. 


Proceeding as in Example 7.5 with 7, replacing LX, , the Cl for = is | ca a where 
) 2 


Mi-42r Xej2ar 
t,=y, +..+y,+(n—r)y,. In Example 6.7, = 20, r= 10, and t,= 1115. With df= 20, the necessary 
critical values are 9.591 and 34.170, giving the interval (65.3, 232.5). This is obviously an extremely wide 


; é ; 1 
interval. The censored experiment provides less information about =z than would an uncensored 


experiment with n = 20. 
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59, 
ea nu" 'du =u" i -|-2-2-|-@. Fromthe probability statement, 
(a/2) (cr/2) 2 2 
(ay 1 (1-6 re 
ASL < — < +=“ with probability 1 — a, so taking the reciprocal of each endpoint and 
max(X,)  @ max(X,) 


X X 
interchanging gives the Cl fast) att for 6. 


(1-%)"" (%)" 


b. a”< 


<1 with probability |—a,so l< 


a” 


—, with probability 1 — a, which yields 


max(X,) l 
@ a” 


abe Steg 
max (X, ) 


the interval as (%,); 


c. It is easily verified that the interval of b is shorter — draw a graph of f,, (uw) and verify that the 


shortest interval which captures area | — a under the curve is the rightmost such interval, which leads 
to the Clofb. With a = .05, n = 5, and max(x,)=4.2, this yields (4.2, 7.65). 


61. ¥ = 76.2, the lower and upper fourths are 73.5 and 79.7, respectively, and f, = 6.2. The robust interval is 


6.2 
76.2 +(1.93)| —= |= 76.2 +2.6 =(73.6,78.8). 
(9935) ie ap 
¥ =77.33 , s = 5.037, and ¢,,, >, = 2.080, so the ¢ interval is 
77.33 +(2.080)( 2) = 77.33+2.23 =(75.1,79.6). The f interval is centered at ¥ , which is pulled out 


to the right of ¥ by the single mild outlier 93.7; the interval widths are comparable. 
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CHAPTER 8 


Section 8.1 


11. 


a. Yes. It is an assertion about the value of a parameter. 
b. No. The sample median < is not a parameter. 
c. No. The sample standard deviation s is not a parameter. 


d. Yes, The assertion is that the standard deviation of population #2 exceeds that of population #1. 


e. No. Xand ¥ are statistics rather than parameters, so they cannot appear in a hypothesis. 


f. Yes. His an assertion about the value of a parameter. 


We reject H) iff P-value < a = .05. 
a. RejectH) b.RejectHy ¢.Donotreject Hy) d.RejectH) e. Do not reject Hy 


In this formulation, Ho states the welds do not conform to specification. This assertion will not be rejected 
unless there is strong evidence to the contrary. Thus the burden of proof is on those who wish to assert that 
the specification is satisfied. Using H,: « < 100 results in the welds being believed in conformance unless 
proved otherwise, so the burden of proof is on the non-conformance claim. 


Let o denote the population standard deviation. The appropriate hypotheses are Hp: o = .05 v. H,: 0 < OS. 
With this formulation, the burden of proof is on the data to show that the requirement has been met (the 
sheaths will not be used unless Hp can be rejected in favor of H,. Type I error: Conclude that the standard 
deviation is <.05 mm when it is really equal to .05 mm. Type II error: Conclude that the standard 
deviation is .05 mm when it is really < .05, 


A type | error here involves saying that the plant is not in compliance when in fact it is. A type Il error 
occurs when we conclude that the plant is in compliance when in fact it isn’s. Reasonable people may 
disagree as to which of the two errors is more serious. If in your judgment it is the type II error, then the 
reformulation Ho: 4 = 150 v. H,: 1 < 150 makes the type I error more serious. 


a. A typeI error consists of judging one of the two companies favored over the other when in fact there is 
a 50-50 split in the population. A type II error involves judging the split to be 50-50 when it is not. 


b. We expect 25(.5) = 12.5 “successes” when Hp is true. So, any X-values less than 6 are at least as 
contradictory to Hp as x = 6. But since the alternative hypothesis states p # .5, X-values that are just as 
far away on the high side are equally contradictory. Those are 19 and above. 

So, values at least as contradictory to Hp as x = 6 are {0,1,2,3,4,5,6,19,20,21,22,23,24,25}. 
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c. When Ap is true, X has a binomial distribution with n = 25 and p = .5. 
From part (b), P-value = P(X < 6 or X > 19) = B(6; 25,.5) + [1 — B(18; 25,.5)] = .014. 


d. Looking at Table A.1, a two-tailed P-value of .044 (2  .022) occurs when x = 7. That is, saying we'll 
reject Hp iff P-value < .044 must be equivalent to saying we'll reject Hy iff XY < 7 or X= 18 (the same 
distance from 12.5, but on the high side). Therefore, for any value of p # .5, B(p) = P(do not reject Ho 
when X ~ Bin(25, p)) = P(7 < X < 18 when X ~ Bin(25, p)) = B(17; 25, p) — B(7; 25, p). 

B.4) = B(17; 25,.4) — B(7, 25,.4) = .845, while B(.3) = B(17; 25, .3) — B(7; 25, .3) = .488. 
By symmetry (or re-computation), f(.6) = .845 and f(.7) = 488. 


e. From part (c), the P-value associated with x = 6 is .014. Since .014 < .044, the procedure in (d) leads 
us to reject Hp. 


13. 
a. Ay w=l0v. Aye F< 10. 


b. Since the alternative is two-sided, values at least as contradictory to Hy as ¥ = 9.85 are not only those 
less than 9.85 but also those equally far from 4 = 10 on the high side: i.e., ¥ values > 10.15. 


; o ; o 200 
When Ay is true, ¥ has a normal distribution with mean yz = 10 and sd = === = .04. Hence, 
’ Vn 25 


P-value = P( ¥ < 9.85 or X > 10.15 when Ap is true) = 2P( X < 9.85 when Ap is true) by symmetry 
9.85-10 
04 


= 2(z < = 2@(—3.75) = 0. (Software gives the more precise P-value .00018.) 


In particular, since P-value = 0 < a= .01, we reject Hp at the .01 significance level and conclude that 
the true mean measured weight differs from 10 kg. 


c. To determine f(z) for any « # 10, we must first find the threshold between P-value < a and P-value > 
a in terms of X . Parallel to part (b), proceed as follows: 


.01 = P(reject Hy when Hp is true) = 2P( X <¥ when Ah is true) = 20( =) = 


of ; ,) =.005 => = = -2,58 => ¥ = 9.8968 . That is, we'd reject Hp at the @ = .01 level iff the 


observed value of Y is < 9.8968 — or, by symmetry, > 10 +(10 — 9.8968) = 10.1032. Equivalently, 
we do not reject Hp at the a = .01 level if 9.8968 < X <10.1032. 

Now we can determine the chance of a type II error: 

(10.1) = P( 9.8968 < X < 10.1032 when w = 10.1) = P(-S.08 < Z < 08) = .5319. 

Similarly, (9.8) = P( 9.8968 < X <10.1032 when u = 9.8) = P(2.42 < Z< 7.58) = .0078. 
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Section 8.2 


15. In each case, the direction of H, indicates that the P-value is P(Z > z) = 1 — O(2). 

a. P-value= 1 — (1.42) = .0778. 

b. P-value = 1 — 0(0.90) = .1841. 

c. P-value = 1 — 0(1.96) = .0250. 

d. P-value = 1 — 0(2.48) = .0066. 

e. P-value=1—@(—.11) = .5438. 
17. 

960 — 
a. passage te = 2.56, so P-value = P(Z > 2.56) = 1 — (2.56) = .0052. 
1500/ V16 
Since .0052 < a =.01, reject Hp. 
b. 2. =Zo1 = 2.33, so B(30500) = (233 vt | = (1.00) =.8413. 
1500/ V16 
2 
0(2. 1. 
C. 2,2 = 2.33 and Zs = Zs = 1.645. Hence, n= 1500(2.33+1.645) = 142.2, so use n = 143. 
30,000 — 30,500 

d. From (a), the P-value is .0052. Hence, the smallest @ at which Hp can be rejected is .0052. 

19. 
94.32-9 
a. Since the alternative hypothesis is two-sided, P-value = 2- 1-0 basco) | 2° [1 —®(2.27)] = 
1.20/V16 
2(.0116) = .0232. Since .0232 > a=.01, we do not reject Hp at the .01 significance level. 
b. Ze2=Zos = 2.58, so 6(94) -0{ 2584 eee }-o[-238 = = (5.91) - (0.75) = 


——— + —————@——_| 
1.20/ Vi6 1.20/ V16 


1.20(2.58+1.28 


Zg =z, = 1.28. Hence, n= ) = 21.46, so use n = 22. 
95-94 


21. The hypotheses are Hy: w = 5.5 v. Hy: 2 #5.5. 


The P-value is 2 | - 0 


ae wh | = 2 - [1 ~0(3.33)] = .0008. Since the P-value is smaller than 
3/16 


any reasonable significance level (.1, .05, .01, .001), we reject Hp. 
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b. The chance of detecting that Hy is false is the complement of the chance of a type II error. With z,2 = 


5.5-5.6 5.5-5.6 
Zoos = 2.58, 1- 8(5.6) =1—| | 2.58+ - | -2.58+ =1— (1.25) + 03.91) = 
e A(5.6) ae) ( 3/416 } me 


1056. 


_ | .3(2.58+2.33) 
“| "S5=56 


2 
| = 216.97 , so use n= 217. 


23. 
a. Using software, x = 0.75, X = 0.64, s = .3025, f, = 0.48. These summary statistics, as well as a box plot 
(not shown) indicate substantial positive skewness, but no outliers. 


b. No, it is not plausible from the results in part a that the variable ALD is normal. However, since 1 = 
49, normality is not required for the use of z inference procedures. 


c. We wish to test Ho: 4 = 1.0 versus H,: 4 < 1.0. The test statistic is z= Hohe ll =—§.79, and so the 
3025/49 


P-value is P(Z <—5.79) = 0. At any reasonable significance level, we reject the null hypothesis. 
Therefore, yes, the data provides strong evidence that the true average ALD is less than 1.0. 


d. ¥+z9;-—== 0.75 + 1.645 3025 = 0.821 


vn V49 


25. Let denote the true average task time. The hypotheses of interest are Hy: « = 2 v. Hy: u < 2. Using z-based 


inference with the data provided, the P-value of the test is fz < | = M(—1.80) = .0359. Since 
20/452 


.0359 > .01, at the a= 01 significance level we do not reject Hy. At the ,01 level, we do not have sufficient 
evidence to conclude that the true average task time is less than 2 seconds. 


27, B( 4-4) = (2, +Avn /o)-®(-z,, +Avn/o) =1-0(-2z,,,~Avn/o)-[1-O(z,, -Avn/a)|= 
(c,)3 -AWnfo)-o(- 2a/2 ~AVn/o) = B(u, +A). 


Section 8.3 


29. The hypotheses are Mp: « = .5 versus H,: u # .5. Since this is a two-sided test, we must double the one-tail 
area in each case to determine the P-value. 
a. n=13—>df=13-—1 = 12. Looking at column 12 of Table A.8, the area to the right of ¢= 1.6 is .068. 
Doubling this area gives the two-tailed P-value of 2(.068) = .134. Since .134 > a = .05, we do not 
reject Hp. 


b. For a two-sided test, observing t =—1.6 is equivalent to observing t = 1.6. So, again the P-value is 
2(.068) = .134, and again we do not reject Hy at a = .05. 


111 


© 2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 


Chapter 8: Tests of Hypotheses Based on a Single Sample 


c. df=n-—1=24; the area to the left of -2.6 = the area to the right of 2.6 = .008 according to Table A.8. 
Hence, the two-tailed P-value is 2(.008) = .016. Since .016 > 01, we do not reject Hy in this case, 
d. Similar to part (c), Table A.8 gives a one-tail area of .000 for t= +3.9 at df = 24. Hence, the two-tailed 
P-value is 2(.000) = .000, and we reject Hp at any reasonable a level. 
31. This is an upper-tailed test, so the P-value in each case is P(7 2 observed ¢). 
a. P-value = P(T> 3.2 with df= 14) = .003 according to Table A.8. Since .003 < .05, we reject Ho. 
b. P-value = P(T > 1.8 with df= 8) = .055. Since .055 > .01, do not reject Ho. 
c. P-value = P(T >—.2 with df = 23) = 1 — P(T = .2 with df = 23) by symmetry = | — .422 = .578. Since 
.578 is quite large, we would not reject Ho at any reasonable a level. (Note that the sign of the observed 
t statistic contradicts H,, so we know immediately not to reject Hp.) 
33s 
a. It appears that the true average weight could be significantly off from the production specification of 
200 Ib per pipe. Most of the boxplot is to the right of 200. 
b. Let: denote the true average weight of a 200 Ib pipe. The appropriate null and alternative hypotheses 
are Hp: 1 = 200 and H,: 4 # 200. Since the data are reasonably normal, we will use a one-sample t 
Te oe ‘ ; : 
procedure. Our test statistic is f= 26,73 00 = G73 = 5.80, for a P-value of = 0. So, we reject Ho. 
6.35/30 1.16 
At the 5% significance level, the test appears to substantiate the statement in part a. 
35. 
a. The hypotheses are Ho: « = 200 versus H,: « > 200. With the data provided, 
= ana a = 249.1-20 9 at df= 12 —1=11, P-value = .128. Since .128 > .05, Hp is not rejected 
s/vn 145.1/ ¥12 
at the a = .05 level. We have insufficient evidence to conclude that the true average repair time 
exceeds 200 minutes. 
- 200 —300 
b. With d= |e = = [200-300] = (0,67 , df = 11, and a=.05, software calculates power ~ .70, so 


o 150 
(300) = .30. 
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The accompanying normal probability plot is acceptably linear, which suggests that a normal 
population distribution is quite plausible. 


Probability Plot of Strength 
Normal - 95% Cl 
“ 
Mean 4 
Sider 4259 


< 
=>z 


Sseseeeas 8 


The parameter of interest is = the true average compression strength (MPa) for this type of concrete. 

The hypotheses are Ho: 4 = 100 versus H,: u < 100. 

Since the data come from a plausibly normal population, we will use the t procedure. The test statistic 

ay ¥ = fy _ 96.42—100 _ 
sin 8.26/10 

P(T <-1.37) = .102. 

The P-value slightly exceeds .10, the largest a level we'd consider using in practice, so the null 

hypothesis Ho: « = 100 should not be rejected. This concrete should be used. 


~1.37. The corresponding one-tailed P-value, at df = 10 — 1 =9, is 


39, Software provides ¥ = 1.243 and s = 0.448 for this sample. 


The parameter of interest is js = the population mean expense ratio (%) for large-cap growth mutual 
funds. The hypotheses are Ho: « = 1 versus H,: “> 1. 

We have a random sample, and a normal probability plot is reasonably linear, so the assumptions for a 
t procedure are met. 


The test statistic is ¢ = Leica = 2.43, for a P-value of P(T > 2.43 at df = 19) = .013. Hence, we 
0.448 //20 


(barely) fail to reject Ho at the .01 significance level. There is insufficient evidence, at the a = .O1 level, 
to conclude that the population mean expense ratio for large-cap growth mutual funds exceeds 1%. 


A Type | error would be to incorrectly conclude that the population mean expense ratio for large-cap 
growth mutual funds exceeds 1% when, in fact the mean is 1%, A Type II error would be to fail to 
recognize that the population mean expense ratio for large-cap growth mutual funds exceeds 1% when 
that’s actually true. 

Since we failed to reject Hp in (a), we potentially committed a Type II error there. If we later find out 
that, in fact, = 1.33, so H, was actually true all along, then yes we have committed a Type II error. 


With n = 20 so df= 19, d= a =.66, and a = .01, software provides power = .66. (Note: it’s 


purely a coincidence that power and d are the same decimal!) This means that if the true values of 
and o are 4 = 1,33 and o = 5, then there is a 66% probability of correctly rejecting Ho: «= | in favor of 
H,: «> | at the .01 significance level based upon a sample of size n = 20. 
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¥-70 755-70 5.5 _ 


41. = true average reading, Hyp: u=70 v. Hy: 4#70,and t= = = — = 1.92. 
£ &, Mo: a f Fs 7/J6 286 


From table A.8, df= 5, P-value = 2[P(T > 1.92)] = 2(.058) =.116. At significance level .05, there is not 
enough evidence to conclude that the spectrophotometer needs recalibrating. 


Section 8.4 


43. 
a. The parameter of interest is p = the proportion of the population of female workers that have BMIs o= 


at least 30 (and, hence, are obese). The hypotheses are Hp: p = .20 versus H,: p > .20. 

With n = 541, mpp = 541(.2) = 108.2 > 10 and n(1 — po) = 541(.8) = 432.8 = 10, so the “large-sample™ = 

procedure is applicable. 

From the data provided, Pare 5518 a= paPo = 2218-20 =1.27 and P-value 
541 Jp p,)/ 1 .20(80)/541 

= P(Z > 1.27) = 1 — 9(1.27) = .1020. Since .1020 > .05, we fail to reject Ho at the a = .05 level. We Ge 

not have sufficient evidence to conclude that more than 20% of the population of female workers is 

obese. 


b. A Type I error would be to incorrectly conclude that more than 20% of the population of female 
workers is obese, when the true percentage is 20%. A Type II error would be to fail to recognize that 
more than 20% of the population of female workers is obese when that’s actually true. 


¢c. The question is asking for the chance of committing a Type II error when the true value of p is 25. i-¢ 
f(.25). Using the textbook formula, 


.20-.25+1.645,/20(.80) / 541 


A(.25) -0| 
V.25(.75)/541 


= (—1.166)* .121. 


45. Let p = true proportion of all donors with type A blood. The hypotheses are Hy: p = .40 versus H,: p = 0 


Using the one-proportion z procedure, the test statistic is z = a ee 3.667 , and the 
\.40(.60)/150  .04 


corresponding P-value is 2P(Z > 3.667) = 0. Hence, we reject Hy. The data does suggest that the 
percentage of all donors with type A blood differs from 40%. (at the .01 significance level). Since the P- 
value is also less than .05, the conclusion would not change. 


47. 
a. The parameter of interest is p = the proportion of all wine customers who would find screw tops 
acceptable. The hypotheses are Hp: p = .25 versus H,: p < .25. 
With n = 106, npo = 106(.25) = 26.5 => 10 and n(1 — po) = 106(.75) = 79.5 = 10, so the “large-sample™ = 
procedure is applicable. 
.208 —.25 Z<-101 = 


From the data provided, p= LS .208 , so z= = —1.01 and P-value = P( = 
106 J25(.75)/ 106 


@(-1.01) = .1562. 

Since .1562 > .10, we fail to reject Ho at the a = .10 level. We do not have sufficient evidence to 
suggest that less than 25% of all customers find screw tops acceptable. Therefore, we recommend that 
the winery should switch to screw tops. 
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b. A Type | error would be to incorrectly conclude that less than 25% of all customers find screw tops 
acceptable, when the true percentage is 25%. Hence, we’d recommend not switching to screw tops 
when there use is actually justified. A Type II error would be to fail to recognize that less than 25% of 
all customers find screw tops acceptable when that’s actually true. Hence, we’d recommend (as we did 
in (a)) that the winery switch to screw tops when the switch is not justified. Since we failed to reject Ho 
in (a), we may have committed a Type II error. 


49. 
a. Let p =true proportion of current customers who qualify. The hypotheses are Ho: p = .05 v. H,: p # .05. 


— 05 3.07, and the P-value is 2 - P(Z > 3.07) = 2(.0011) = .0022. 


The test statistic is z =_——————, 
:05(.95)/ n 


Since .0022 < a@=.01, Hp is rejected. The company’s premise is not correct. 


05-.10+2.58,/.05(.95)/ 500 05 ~.10 -2.58,/.05(.95)/ 500 
b. A(.10)=0 a (95) rae eae 
/.10(.90)/ 500 -10(.90)/ 500 


= (-1.85)-0= 0332 


51. The hypotheses are Hy: p = .10 v. Hy: p > .10, and we reject Ho iff X> c for some unknown c. The 
corresponding chance of a type I error is a = P(X > c when p = .10)= 1 — B(c — 1; 10, .1), since the rv X has 
a Binomial(10, .1) distribution when 4, is true. 
The values n = 10, c = 3 yield a = 1 — B(2; 10, .1) = .07, while a> .10 forc = 0, 1, 2. Thus c = 3 is the best 
choice to achieve a <.10 and simultaneously minimize £. However, A(.3) = P(X < c when p = .3) = 
B(2; 10, .3) = .383, which has been deemed too high. So, the desired a and f levels cannot be achieved with 
a sample size of just n = 10. 
The values n = 20, c= 5 yield a = 1 — B(4; 20, .1) = .043, but again A{.3) = B(4; 20, .3) = .238 is too high. 
The values n = 25, c = 5 yield a = 1 — B(4; 25, .1) = .098 while (3) = B(4; 25, .3) = .090 < .10, son = 25 
should be used. In that case and with the rule that we reject Ho iff X > 5, a = .098 and A(.3) = .090. 


Section 8.5 


53. 
a. The formula for fis 1—- of- 2.33 ea which gives .8888 for n = 100, .1587 for n = 900, and .0006 
for n = 2500, 
b. Z=—5.3, which is “off the z table,” so P-value < .0002; this value of z is quite statistically significant. 


c. No. Even when the departure from Mp is insignificant from a practical point of view, a statistically 
significant result is highly likely to appear; the test is too likely to detect small departures from Hy. 
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55. 
a. The chance of committing a type I error on a single test is .01. Hence, the chance of committing at 
least one type I error among mm tests is P(at least on error) = | — P(no type I errors) = 1 -[P(no type I 
error)]" by independence = 1 — .99”. For m = 5, the probability is .049; for m = 10, it’s .096. 


b. Set the answer from (a) to .5 and solve for m: 1 — .99" > .5 => .99" < .5 => m > log(.5)/log(.99) = 68.97. 
So, at least 69 tests must be run at the a = .01 level to have a 50-50 chance of committing at least one 
type I error. 


Supplementary Exercises 


Si; Because n = 50 is large, we use a z test here. The hypotheses are Hp: 4 =3.2 versus H,: 2 #3.2. The 
3.05 — 3.20 


34/450 


.0018 < .05, Ho should be rejected in favor of H,. 


computed z value is z = = —3.12, and the P-value is 2 P(Z > |-3.12|) = 2(.0009) = 0018. Since 


59. 
a Ay: w=.85 v. Hy: uF 85 
b. Witha P-value of .30, we would reject the null hypothesis at any reasonable significance level, which 
includes both .05 and .10. 
61. 


a. The parameter of interest is 4 = the true average contamination level (Total Cu, in mg/kg) in this 
region. The hypotheses are Hp: « = 20 versus H,: 4 > 20. Using a one-sample f procedure, with ¥ = 


45.31 and SE( x ) = 5.26, the test statistic is f= ee = 3,86. That’s a very large t-statistic; 


however, at df = 3 — 1 = 2, the P-value is P(7 > 3.86) = .03. (Using the tables with ¢ = 3.9 gives a P- 
value of = .02.) Since the P-value exceeds .01, we would fail to reject Hy at the a = .01 level. 

This is quite surprising, given the large t-value (45.31 greatly exceeds 20), but it’s a result of the very 
small n. 


b. We want the probability that we fail to reject Hp in part (a) when n = 3 and the true values of u and ¢ 
are 4 = 50 and o = 10, i.e. £(50). Using software, we get (50) ~ .57. 


63. n= 47, X =215 mg, s = 235 mg, scope of values = 5 mg to 1,176 mg 
a. No, the distribution does not appear to be normal. It appears to be skewed to the right, since 0 is less 
than one standard deviation below the mean. It is not necessary to assume normality if the sample size 
is large enough due to the central limit theorem. This sample size is large enough so we can conduct a 
hypothesis test about the mean. 


b. The parameter of interest is « = true daily caffeine consumption of adult women, and the hypotheses 


215—200 4 
———— = 44witha 
235/47 


corresponding P-value of P(Z > .44) = 1 — ®(.44) = .33. We fail to reject Hp, because .33 > .10. The 
data do not provide convincing evidence that daily consumption of all adult women exceeds 200 mg. 


are Ho; « = 200 versus H,: u > 200. The test statistic (using a z test) is z = 
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65. 
a. From Table A.17, when y = 9.5, d = .625, and df = 9, 8 = .60. 
When uw = 9.0, d= 1.25, and df = 9, 8 ~ .20. 
b. From Table A.17, when # =.25 and d=.625,n = 28. 
67. 
a. With Hy: p= 1/75 v. Hy: p #1/75, p ee 02, see = 1.645 , and P-value = .10, we 
800 .01333(.98667 
800 
fail to reject the null hypothesis at the a = .05 level. There is no significant evidence that the incidence 
rate among prisoners differs from that of the adult population. 
The possible error we could have made is a type II. 
b. P-value =2{1-(1.645)] = 2[.05]=.10. Yes, since .10 < .20, we could reject Hp. 
69. Even though the underlying distribution may not be normal, a z test can be used because n is large. The 
null hypothesis Ho: « = 3200 should be rejected in favor of H,: 4 < 3200 if the P-value is less than .001. 
The computed test statistic is z = =e —3.32 and the P-value is &(—3.32) = .0005 < .001, so Ho 


188//45 
should be rejected at level .001. 


z-4 
The We wish to test Hy: « = 4 versus H,: 4 > 4 using the test statistic z = ion . For the given sample, n = 36 
n 


4.444—4 


V4/36 

The P-value is P(Z> 1.33) = 1 — (1.33) = .0918. Since .0918 > .02, Ho should not be rejected at this level. 
We do not have significant evidence at the .02 level to conclude that the true mean of this Poisson process 
is greater than 4. 


PE es ae er ee 1,33. 
36 


73. The parameter of interest is p = the proportion of all college students who have maintained lifetime 
abstinence from alcohol. The hypotheses are Ho: p = .1, Hy: p > .1. 
With n = 462, npp = 462(.1) = 46.2 > 10 n(1 — po) = 462(.9) = 415.8 > 10, so the “large-sample” z procedure 
is applicable. 
.1104-.1 


Pine 3! 
From the data provided, p =—— =.1104, so z= = 0,74 
y P* 462 J.1(.9)/ 462 


The corresponding one-tailed P-value is P(Z > 0.74) = 1 — (0.74) = .2296. 

Since .2296 > .05, we fail to reject Ho at the a = .05 level (and, in fact, at any reasonable significance level). 
The data does not give evidence to suggest that more than 10% of all college students have completely 
abstained from alcohol use. 
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75. 


77. 


19. 


Chapter 8: Tests of Hypotheses Based on a Single Sample 


Since n is large, we'll use the one-sample z procedure. With = population mean Vitamin D level for 


21-20 


infants, the hypotheses are Ho: u = 20 v. H,: u > 20. The test statistic is z === = 0.92, and the 
11/V102 


upper-tailed P-value is P(Z > 0.92) = .1788. Since .1788 > .10, we fail to reject Ho. It cannot be concluded 
that 4 > 20. 


The 20 df row of Table A.7 shows that 7%, 4) =8.26 <8.58 (Hj not rejected at level .01) and 


8.58<9.591= D ge (Hp rejected at level .025). Thus .01 < P-value < .025, and Hp cannot be rejected at 
level .01 (the P-value is the smallest a at which rejection can take place, and this exceeds .01). 


When Mp is true, 2A, 2X, = ry X, has a chi-squared distribution with df= 2n. If the alternative is 


Hy 
Hy: 1 <M, then we should reject Hy in favor of H, when the sample mean X is small. Since x is small 
exactly when Ex, is small, we'll reject Hy when the test statistic is small. In particular, the P-value 


should be the area to the left of the observed value 2s ’ 
Ho 


The hypotheses are Ho: « = 75 versus H,: « < 75. The test statistic value is ie = (31) = 
Ho 


19.65. At df = 2(10) = 20, the P-value is the area to the left of 19.65 under the pea curve. From 


software, this is about .52, so Hp clearly should not be rejected (the P-value is very large). The sample 
data do not suggest that true average lifetime is less than the previously claimed value. 
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CHAPTER 9 


Section 9.1 
a. E(X-Y)=£(X)-E(¥)=4.1-4.5=—4, irrespective of sample sizes. 


2 2 2 2 
b. v(X—7)=v(R)+v (P= +AU CY. o7ae, and te SD of ¥-Fis 
m n 


¥ -Y = .0724 =.2691. 


c. A normal curve with mean and sd as given in a and b (because m = n = 100, the CLT implies that both 
X and Y have approximately normal distributions, so X¥—Y does also). The shape is not 
necessarily that of a normal curve when m = n = 10, because the CLT cannot be invoked. So if the two 
lifetime population distributions are not normal, the distribution of X¥ —-Y will typically be quite 
complicated. 


3. Let «, = the population mean pain level under the control condition and y, = the population mean pain 


level under the treatment condition. 
a. The hypotheses of interest are Hp: 4 — 42 = 0 versus H,: 4; — 2 > 0. With the data provided, the test 


(5.2-3.1)-0 _ 4 55. The corresponding P-value is P(Z > 4.23) = 1 - (4.23) =0. 
27 2s 
+ 
43 43 
Hence, we reject Hp at the a = .01 level (in fact, at any reasonable level) and conclude that the average 


pain experienced under treatment is less than the average pain experienced under control. 


Statistic value is z= 


b. Now the hypotheses are Ho: #t) — 2 = | versus H,: jt; — #2 > 1. The test statistic value is 
2 - 82-=3.)=1 ~ 9 99 and the P-value is P(Z > 2.22) = 1 - (2.22) = 0132. Thus we would reject 


Hp at the a = .05 level and conclude that mean pain under control condition exceeds that of treatment 
condition by more than | point. However, we would not reach the same decision at the a = .01 level 


(because .0132 < .05 but .0132 > .01). 


a. H, says that the average calorie output for sufferers is more than | cal/cm’/min below that for non- 


2 2 2 2 =e a 
sufferers. foie Se = A) OD ata, so PE (54-205) (5 00. The P-value for this 
mon 10 10 1414 


one-sided test is P(Z < —2.90) = .0019 < .01. So, at level .01, Ho is rejected. 


b. 2¢=291 = 2.33, and so p(-12)=1-0{-233- ae =1-0(-.92) =.8212, 
, ck Ee | 
(-.2) 
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Chapter 9: Inferences Based on Two Samples 


B Let jt; denote the true mean course GPA for all courses taught by full-time faculty, and let jo denote the 
true mean course GPA for all courses taught by part-time faculty. The hypotheses of interest are Ho: 4) = #2 
versus H,: 14 #2; of, equivalently, Ho: 4, — fo = 0 v. Ay: fy — pn #9. 


x¥-y)- 9) ze 
The large-sample test statistic is z = Lia Del = IS ee a 1.88. The corresponding 


(.63342)° ‘ (.49241)’ 
125 88 
two-tailed P-value is P(|Z| > |-1.88]) = 2[1 — (1.88)] = .0602. 
Since the P-value exceeds a = .01, we fail to reject Hp. At the .01 significance level, there is insufficient 
evidence to conclude that the true mean course GPAs differ for these two populations of faculty. 


9. 
a. Point estimate X— y =19.9-13.7 =6.2. It appears that there could be a difference. 
19.9—13.7 : 
b. Ao: i fn = 0, Ay: fy — 2 FO, 2 Pred cae pts =1.14, and the P-value = 2[P(Z> 1.14)] = 
39.17 15.8) 544 
60 60 
2( 1271) = .2542. The P-value is larger than any reasonable a, so we do not reject Ho. There is no 
statistically significant difference. 

c. No. Witha normal distribution, we would expect most of the data to be within 2 standard deviations of 
the mean, and the distribution should be symmetric. Two sd’s above the mean is 98.1, but the 
distribution stops at zero on the left. The distribution is positively skewed. 

d. We will calculate a 95% confidence interval for jz, the true average length of stays for patients given 

39.1 
the treatment. 19.9+1.96—— =19.949.9 =(10.0,21.8). 
Te ( ) 
11. (x-¥)+z,, S$ 42 = (x-¥)+z,,.\(SE, ) +(SE,) . Using a= .05 and 2. = 1.96 yields 
m n 
(5.5- 3.8)+1.96,/(0.3) +(0.2) =(0.99,2.41). We are 95% confident that the true average blood lead 
level for male workers is between 0.99 and 2.41 higher than the corresponding average for female workers. 
13. o, =0, =.05, d= .04, a=.01, 8 = .05, and the test is one-tailed > 
0025 +.0025)(2.33 +1.645) 
n= (002540025 (293 0) Saag so use n = 50. 
.0016 
15. 


a. Aseither m or n increases, SD decreases, so 


4, —jt,—A, . : as 
TR increases (the numerator is positive), so 


EG f Aa Ae decreases, so f= of z= Hy — ty 4o decreases. 
SD SD 


b. As fi decreases, zy increases, and since zy is the numerator of n, n increases also. 
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Chapter 9: Inferences Based on Two Samples 


Section 9.2 


17: 
5? /10+67 /10) 
a. Ap ia aaa ila) Sosseg ANE Pe 
(5/10) /9+(67/10) /9 -694+1.44 
5? /10+6?/15) 
pA) ner a ak 
(5°/10) /9+(67/15) /14 -694+.411 
2 2 2 
2° /10+6° /15 
Sew (ak) Mn LS eee 
(27/10) /9+(67/15) /14 -018+.411 
5?/12+6? /24) 
ome AL ye 
(57/12) /114(67/24) (23 395+.098 
19. For the given hypotheses, the test statistic is t = AR —1.20 , and the dfis 
SoF 4 538 3.007 
2 
= er = 9.96 , so use df= 9. The P-value is P(T <-1.20 when T ~ fg) = .130. 
(4.2168)' | (4.8241)' 
5 5 
Since .130 > .01, we don’t reject Ap. 
21, Let y, =the true average gap detection threshold for normal subjects, and 41, = the corresponding value for 
CTS subjects. The relevant hypotheses are Hp: pt; ~ 2 = 0 v. Hg: fy — Ha <0, and the test statistic is 
: ee 07569) 
Bg UR Tea ag Using df Se ee he 15.1, or 15, the P-value 
V.0351125+.07569 .3329 (.0351125) ri (.07569) 
7 9 


is P(T <—2.46 when T ~ ts) = .013. Since .013 > .01, we fail to reject Hy at the a = .01 level. We have 
insufficient evidence to claim that the true average gap detection threshold for CTS subjects exceeds that 
for normal subjects. 
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Chapter 9: Inferences Based on Two Samples 


23. 
a. Using Minitab to generate normal probability plots, we see that both plots illustrate sufficient linearity. 


Therefore, it is plausible that both samples have been selected from normal population distributions. 


Normal Probability Plot for High Quality Fabric Normal Probability Plot for Poor Quality Fabric 


b. The comparative boxplot does not suggest a difference between average extensibility for the two types 
of fabrics. 


Comparative Box Plot for High Quality and Poor Quality Fabric 


Poor 
Quality 


05 15 25 
extensibility (%) 


c. Wetest Hy : 4 — #2 =0 v. H,:4,- #0. With degrees of freedom v= (,0433265)" _ 10.5 


.00017906 
(which we round down to 10) and test statistic is f = eee = —,38 ~-0.4, the P-value is 
(0433265) 


2(.349) = .698. Since the P-value is very large, we do not reject Ho. There is insufficient evidence to 
claim that the true average extensibility differs for the two types of fabrics. 
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Chapter 9: Inferences Based on Two Samples 


25. 
a. Normal probability plots of both samples (not shown) exhibit substantial linear patterns, suggesting 


that the normality assumption is reasonable for both populations of prices. 


b. The comparative boxplots below suggest that the average price for a wine earning a > 93 rating is 
much higher than the average price earning a < 89 rating. 


c. From the data provided, ¥ = 110.8, 7 =61.7, s; = 48.7, sz = 23.8, and v = 15. The resulting 95% CI for 


2 
the difference of population means is (110.8-61 Deas BS. = (16.1, 82.0). That is, we 


are 95% confident that wines rated > 93 cost, on average, between $16.10 and $82.00 more than wines 
rated < 89, Since the CI does not include 0, this certainly contradicts the claim that price and quality 
are unrelated. 


“i. 
a. _Let’s construct a 99% CI for stan, the true mean intermuscular adipose tissue (IAT) under the 
described AN protocol. aig 3 the data comes from a normal population, the CI is given by 


—= = .52+2. 947278 = (.33, .71). We are 99% confident that the true mean 


F Blan oe SOE es V16 


[AT under the AN protocol is between .33 kg and .71 kg. 


b. Let’s construct a 99% CI for “an — sc, the difference between true mean AN IAT and true mean 
control IAT. — the data come from normal populations, the Cl is given by 


s (26), (.15)’ (.26)" bis ee 
l 


aa = (52.35) ttasai pet g — = 17 £2.831 = (-.07, 41). 


(x—y) + 


ally 


Since this CI ‘tude zero, it’s plausible that the difference between the two true means is zero (i.e., 
HAN ~ Me = 0). [Note: the df calculation v = 21 comes from applying the formula in the textbook. ] 


29. Let 1, = the true average compression strength for strawberry drink and let “2 = the true average 
compression strength for cola. A lower tailed test is appropriate. We test Ho: 4) — fo = Ov. Ay: fy) — fo <9. 


5 is pane) 197136 1905) so uae dO, 


The test statistic is t= =-2.10; 
V29.4+15 (29.4) 4) , (15): 77.8114 
14 14 
The P-value ~ P(t <—2.10) = .023. This P-value indicates strong support for the alternative hypothesis. 
The data does suggest that the extra carbonation of cola results in a higher average compression strength. 
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<8 


35. 


Chapter 9: Inferences Based on Two Samples 


a. The most notable feature of these boxplots is the larger amount of variation present in the mid-range 
data compared to the high-range data. Otherwise, both look reasonably symmetric with no outliers 
present. 


Comparative Box Plot for High Range and Mid Range 


470 


460 


43 oe 
40 


mid range high range 


mid range 


b. Using df= 23, a 95% confidence interval fOT f...scange ~ Hnigh-range #8 


(438.3-437.45)+ 2.069,/5- + 98° = 85 +8.69 = (—7.84,9.54). Since plausible values for 


Hyiseange ~ Hrigh-range 2¢ both positive and negative (i.e., the interval spans zero) we would conclude that 


there is not sufficient evidence to suggest that the average value for mid-range and the average value 
for high-range differ. 


Let 1, and 2) represent the true mean body mass decrease for the vegan diet and the control diet, 
respectively. We wish to test the hypotheses Ho: 4; — M2 < 1 v. Hy: 4) — M2 > 1. The relevant test statistic is 


, - 68-38) 


eee 4) 3:84 
+ 


Pe ee i 
one-sided P-value of .098 (a computer will give the more accurate P-value of .094). 
Since our P-value > a = .05, we fail to reject Hy at the 5% level. We do not have statistically significant 
evidence that the true average weight loss for the vegan diet exceeds the true average weight loss for the 
control diet by more than | kg. 


= 1,33, with estimated df = 60 using the formula. Rounding to ¢ = 1.3, Table A.8 gives a 


There are two changes that must be made to the procedure we currently use. First, the equation used to 
(z-3)-A 
Ge) 


compute the value of the f test statistic is: f= where s, is defined as in Exercise 34. Second, 


S,.j—+— 
’Vm on 


the degrees of freedom = m + n— 2. Assuming equal variances in the situation from Exercise 33, we 

calculate s, as follows: s,, = [2 es) {3 \esy =2.544. The value of the test statistic is, 

(32.8-40.5) —(-5) 
I 


2.544 Ls uh 
8 10 


.021 > .01, we fail to reject Ho. 


then, t= =~-2.24=-2.2 with df= 16, and the P-value is P(7 <-2.2) = .021. Since 
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Chapter 9: Inferences Based on Two Samples 


Section 9.3 


37. 
a. This exercise calls for paired analysis. First, compute the difference between indoor and outdoor 
concentrations of hexavalent chromium for each of the 33 houses. These 33 differences are 
summarized as follows: n = 33, d =—.4239, s, =.3868 , where d = (indoor value — outdoor value). 


Then f,,.,) =2.037 , and a 95% confidence interval for the population mean difference between indoor 


and outdoor concentration is —.4239 +0037 =) = —.4239+.13715 =(—.5611,—.2868). We can 


33 


be highly confident, at the 95% confidence level, that the true average concentration of hexavalent 
chromium outdoors exceeds the true average concentration indoors by between .2868 and .5611 


nanograms/m’. 


b. A 95% prediction interval for the difference in concentration for the 34” house is 
d + ta353(8pVl++) = --4239 + (2.037)|.3868, [i+%) =(-1.224,.3758). This prediction interval 


means that the indoor concentration may exceed the outdoor concentration by as much as .3758 
nanograms/m’ and that the outdoor concentration may exceed the indoor concentration by a much as 
1.224 nanograms/m* , for the 34™ house. Clearly, this is a wide prediction interval, largely because of 
the amount of variation in the differences. 


39. 
a. The accompanying normal probability plot shows that the differences are consistent with a normal 


population distribution. 


Probability Plot of Differences 
Normal 


b. We want to test Ho: zp = 0 versus H,: up #0. The test statistic is t= <—- = — 7 = 2.74, and 
Sp/Nn 


the two-tailed P-value is given by 2[P(T > 2.74)] = 2[P(T > 2.7)] = 2[.009] = .018. Since .018 < .05, 
we reject Hy. There is evidence to support the claim that the true average difference between intake 
values measured by the two methods is not 0. 
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Chapter 9: Inferences Based on Two Samples 


41. 
a. Let p denote the true mean change in total cholesterol under the aripiprazole regimen. A 95% CI for 


Mp, using the “large-sample” method, is d+ Zion - = 3.75 +1.96(3.878) = (—3.85, 11.35). 
n 


b. Now let up denote the true mean change in total cholesterol under the quetiapine regimen. The 
hypotheses are Ho: 4p = 0 versus H,: up > 0. Assuming the distribution of cholesterol changes under 
this regimen is normal, we may apply a paired ¢ test: 

d-A, 9.05-0 

s,/Nn 4.256 


Our conclusion depends on our significance level. At the a = .05 level, there is evidence that the true 
mean change in total cholesterol under the quetiapine regimen is positive (i.e., there’s been an 
increase); however, we do not have sufficient evidence to draw that conclusion at the a = .01 level. 


t= 


=2.126=> P-value = P(T45 = 2.126) = P(Tys = 2.1) = .02. 


c. Using the “large-sample” procedure again, the 95% CI is d+1.96-2 = dtl .96SE(d) . If this equals 


Vn 
(7.38, 9.69), then midpoint = d = 8.535 and width = 2(1.96 SE(d )) = 9.69 — 7.38 =2.31> 


SE(2)= 2.31 
~ 2(1.96) 


method): d +2.576SE(d) = 8.535 +2.576(.59)= 8.535 + 1.52 =(7.02, 10.06). 


= .59. Now, use these values to construct a 99% CI (again, using a “large-sample” z 


43. 
a. Although there is a “jump” in the middle of the Normal Probability plot, the data follow a reasonably 
straight path, so there is no strong reason for doubting the normality of the population of differences. 


b. A95% lower confidence bound for the population mean difference is: 


d ~t | = -38.60-(I 61) 2°) = ~38.60-10.54=-49.14. We are 95% confident that the 


Vn vis 


true mean difference between age at onset of Cushing’s disease symptoms and age at diagnosis is 
greater than -49.14. 


¢. A95% upper confidence bound for the population mean difference is 38.60 + 10.54 = 49.14. 


45. 
a. Yes, it’s quite plausible that the population distribution of differences is normal, since the 
accompanying normal probability plot of the differences is quite linear. 


Probability Plot of diff 
Norra! 


Percent 
eS w@eesbease 8 8 £ 
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Chapter 9: Inferences Based on Two Samples 


b. No. Since the data is paired, the sample means and standard deviations are not useful summaries for 
inference. Those statistics would only be useful if we were analyzing two independent samples of data. 
(We could deduce d by subtracting the sample means, but there’s no way we could deduce sp from the 
separate sample standard deviations.) 


¢. The hypotheses corresponding to an care test are Ho: ip = 0 versus H,: 4p > 0. From the data 
rovided, the paired ¢ test statistic is =———& = ———— = 3.66. The corresponding P-value is 
: Sp! ida 87.4/V15 : 


P(T\4 > 3.66) = P(T\4 > 3.7) = .001. While the P-value stated in the article is inaccurate, the conclusion 
remains the same: we have strong evidence to suggest that the mean difference in ER velocity and IR 
velocity is positive. Since the measurements were negative (e.g. 130.6 deg/sec and -98.9 deg/sec), 
this actually means that the magnitude of IR velocity is significantly higher, on average, than the 
magnitude of ER velocity, as the authors of the article concluded. 


47. From the data, n = 12, d =-0.73, sp =2.81. 
a. Let sp = the true mean difference in strength between curing under moist conditions and laboratory 


drying conditions. A 95% Cl for ip is d + toass80/Vn =—0.73 + 2.201(2.81)/ V0 = 


(-2.52 MPa, 1.05 MPa). In particular, this interval estimate includes the value zero, suggesting that 
true mean strength is not significantly different under these two conditions. 


b. Since n = 12, we must check that the differences are plausibly from a normal population. The normal 
probability plot below strongly substantiates that condition. 


Normal Probability Plot of Differences 
Normal 


» 6 BesesBas BR 8 
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Chapter 9: Inferences Based on Two Samples 


Section 9.4 
49, Let p; denote the true proportion of correct responses to the first question; define p2 similarly. The 
hypotheses of interest are Hp: p; — p2 = 0 versus H,: p; — p2 > 0. Summary statistics are my = "2 = 200, 
P= =, = 82, p, = i = .70, and the pooled proportion is p = .76. Since the sample sizes are large, we 
may apply the two-proportion z test procedure. 
The calculated test statistic is z = nna Jai = 2.81, and the P-value is P(Z > 2.81) = .0025. 
(.76)(.24)[ abs + ato] 


Since .0025 < .05, we reject Hp at the a = .05 level and conclude that, indeed, the true proportion of correct 
answers to the context-free question is higher than the proportion of right answers to the contextual one. 


51. Let p; = the true proportion of patients that will experience erectile dysfunction when given no counseling, 
and define p, similarly for patients receiving counseling about this possible side effect. The hypotheses of 
interest are Hy: p; — p> = 0 versus H,: p, — p2 < 0. 
The actual data are 8 out of 52 for the first group and 24 out of 55 for the second group, for a pooled 


proportion of p= = 24 = 299. The two-proportion z test statistic is (153 ~ 436) =O) 3.90, and 


+55 (.299.701)[45+ 4] 

the P-value is P(Z < —3.20) = .0007. Since .0007 < .05, we reject Hy and conclude that a higher proportion 
of men will experience erectile dysfunction if told that it’s a possible side effect of the BPH treatment, than 
if they weren’t told of this potential side effect. 


53. 
a. Let, and p2 denote the true incidence rates of GI problems for the olestra and control groups, 
respectively. We wish to test Hp: p; — fo = 0 v. Hy: p, — p2 #0. The pooled proportion is 
.  529(.176 é Ser 
p= ich ed ake wala = .1667, from which the relevant test statistic is z= 
529 +563 
pico Heat La Ama § = (0.78. The two-sided P-value is 2P(Z > 0.78) = .433 > a=.05, 
4(.1667).8333)[5297! +5637] 
hence we fail to reject the null hypothesis. The data do not suggest a statistically significant difference 
between the incidence rates of GI problems between the two groups. 
1.964/(.35)(1.65) / 2 +1.28,/(.15)(.85) + (.2)(.8 
b. on BE EDDIE RE SEE = 1210.39 , so a common sample size of m =” = 
(.05)° 
1211 would be required. 
55. 


a. A 95% large sample confidence interval formula for In(@) is In(6)+ Zp padi? Taking the 
\ mx ny 


antilogs of the upper and lower bounds gives the confidence interval for @ itself. 
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Chapter 9: Inferences Based on Two Samples 


ee ‘ | 
b. 6= aoa =1.818, in(@) = 598 , and the standard deviation is | 


11,037 | 


__10,845 _, _10,933___ 113, so the Cl for In(@) is .598+1.96(.1213) = (.360,.836).. | 
(11,034)(189) (11,037)(104) 
Then taking the antilogs of the two bounds gives the CI for Ato be (1.43, 2.31). We are 95% confident 
that people who do not take the aspirin treatment are between 1.43 and 2.31 times more likely to suffer 
a heart attack than those who do. This suggests aspirin therapy may be effective in reducing the risk of 

a heart attack. 


2 ART = 550, = =.690 , and the 95% Cl is (.550-.690)+1.96(.106) = —.14+.21 =(-.35,.07). 


Section 9.5 


59. 
a. From Table A.9, column 5, row 8, Fy, 5. =3.69 . 
b. From column 8, row 5, Fy,,; = 4.82. 
Cc. F558 = = 207 . 
.05,8,5 
da. Fosgs = =.271 
F558 
e.  Foy012 = 4.30 
1 1 
f..- Fo91012 => EZ = 212 
7 Foo 4.71 
g- F564 =6.16,80 P(F <6.16)=.95. 
h. Since Fo9195 = = =.177, P(177< F< 4.74) = P(F < 4.74)-P(F $.177) =.95-.01=.94. 
cous Cae bogecty (2.75) 
61. We test H,:0° =o" v. H,:0° #0". The calculated test statistic is f = (4 aay = .384. To use Table 
A.9, take the reciprocal: 1/f= 2.61. With numerator df =m — 1 = 5— 1 = 4 and denominator df=n-1= 
10 — | = 9 after taking the reciprocal, Table A.9 indicates the one-tailed probability is slightly more than 
.10, and so the two-sided P-value is slightly more than 2(.10) = .20. 
Since .20 > .10, we do not reject Hp at the a = .1 level and conclude that there is no significant difference 
between the two standard deviations. 
63. Let o? = variance in weight gain for low-dose treatment, and o; = variance in weight gain for control 
2 2 
condition. We wish to test H,:0; =o} v. H,:0) >}. The test statistic is f = 4. = os = 2.85. From 
52 


Table A.9 with df= (19, 22) ~ (20, 22), the P-value is approximately .01, and we reject Hp at level .05. The 
data do suggest that there is more variability in the low-dose weight gains. 
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2 1? h f 
65. ( Fines > 2 oe SF eins =1-a@. The set of inequalities inside the parentheses is clearly 


2 


SUF; 


F 2 | 5 S3F ? & ; 2 > 9 6 
equivalent to =a ln < 2 <2 aim in! Substituting the sample values s; and s, yields the 


Sy oO; 1 
‘ o 
confidence interval for — 

oO; 


, and taking the square root of each endpoint yields the confidence interval for 


o  Withm=n=4, we need F,,,, =9.28 and Fasa3 = pag = 108. Then with s, = .160 and s2 = .074, 
3G; 3, $33 = 9 


2 


the CI for 2. is (.023, 1.99), and for 2 is (.15, 1.41). 
CO; 


2 
1 


Supplementary Exercises 


67. We test Hy : 4, —, =0 v. Hy: —M, #0. The test statistic is 
x¥-y)-A ~ 
(¥-9) SLR. aC = 3.22. The approximate df is 


6 = = =_ee. 
s; 3 27 yi 4 241 15.524 
m on 10 10 


2 
241) = 15.6, which we round down to 15. The P-value for a two-tailed test is 


2 


approximately 2P(T > 3.22) = 2( .003) = .006. This small of a P-value gives strong support for the 
alternative hypothesis. The data indicates a significant difference. Due to the small sample sizes (10 each), 
we are assuming here that compression strengths for both fixed and floating test platens are normally 
distributed. And, as always, we are assuming the data were randomly sampled from their respective 


populations. 


Let p; = true proportion of returned questionnaires that included no incentive; 2 = true proportion of 


69. 
returned questionnaires that included an incentive. The hypotheses are 7, : p,— p, = v. H,: p,-P,<@- 
The test statistic is z= oe ey Se : 
VPaa+*) 
nul wie ae OG ; : . y ; a treta 
P= ilo” .682 and p, = 8 .673 ; at this point, you might notice that since p, > p,, the numerator of the 


z statistic will be > 0, and since we have a lower tailed test, the P-value will be > .5. We fail to reject Ho. 
This data does not suggest that including an incentive increases the likelihood of a response. 
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71. The center of any confidence interval for 4, — 4, is always x, —X,,80 X,-X, = 609.3. 


1691.9-(-473.3) 
2 


Furthermore, half of the width of this interval is =1082.6. Equating this value to the 


2 a 
expression on the right of the 95% confidence interval formula, we find S42 = Ci = 552.35. 
| nm Nn; i 


For a 90% interval, the associated z value is 1.645, so the 90% confidence interval is then 
609.3+ (1.645)(552.35) = 609.3+908.6 = (-299.3, 151 7.9) ; 


73. Let 1, and j denote the true mean zinc mass for Duracell and Energizer batteries, respectively. We want to 
test the hypotheses Hp: 4, — 2 = 0 versus H,: 4, — #2 # 0. Assuming that both zinc mass distributions are 
(¥-Y)—Ay _ (138.52 —149.07)-0 _ 519. 
aA (7.76)" , (1.52) 
mi on 15 20 
The textbook’s formula for df gives v = 14. The P-value is P(7\4<—5.19) ~ 0. Hence, we strongly reject Ho 
and we conclude the mean zinc mass content for Duracell and Energizer batteries are not the same (they do 


differ). 


normal, we’ll use a two-sample f test; the test statistic is ¢ = 


75. Since we can assume that the distributions from which the samples were taken are normal, we use the two- 
sample f test. Let 4, denote the true mean headability rating for aluminum killed steel specimens and uy 
denote the true mean headability rating for silicon killed steel. Then the hypotheses are H, : 4,~-, =0 V. 


—.66 -.66 s 
H.:u,—,#0. The test statistic is ¢=——$—$—$—$——_ = ———— = -2.25. The approximate 
2 V.03888+.047203 V.086083 - 
2 
degrees of freedom are v = eo sa. See =57.5 \,57. The two-tailed P-value ~ 2(.014) = .028, 
(.03888) ,, (047203) 
29 29 


which is less than the specified significance level, so we would reject Ho. The data supports the article’s 
authors’ claim, 


77. 
a. The relevant hypotheses are H, :4,-4,=0 v. H,:44,-, #0. Assuming both populations have 


normal distributions, the two-sample f test is appropriate. m= 11, ¥=98.1, 5; = 14.2, n= 15, 


31.1 -31.1 
Y =129.2 , s, = 39.1. The test statistic is ¢= = =-2.84. The 
3 Vi8.3309+101.9207 120.252 
2 
approximate degrees of freedom v= Pee NA css SO =18.64 S18. From Table A.8, the 
(18.3309) é (101.9207) 
10 14 


two-tailed P-value ~ 2(.006) = .012. No, obviously the results are different. 


b. For the hypotheses H, : 4, —M, =-25 v. H,: 4, — 4 <—25, the test statistic changes to 
ee -31.1-(-25) _ 


120.252 
than any sensible choice of a, we fail to reject Ho. There is insufficient evidence that the true average 
strength for males exceeds that for females by more than 25 N. 


—.556. With df= 18, the P-value = P(7<-—.6) =.278. Since the P-value is greater 
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To begin, we must find the % difference for each of the 10 meals! For the first meal, the % difference is 


—stated 212-1 
ee = a = .1778, or 17.78%. The other nine percentage differences are 45%, 
state 


21.58%, 33.04%, 5.5%, 16.49%, 15.2%, 10.42%, 81.25%, and 26.67%. 
We wish to test the hypotheses Ho: « = 0 versus H,: « # 0, where yu denotes the true average percent 
difference for all supermarket convenience meals. A normal probability plot of these 10 values shows some 
noticeable deviation from linearity, so a t-test is actually of questionable validity here, but we'll proceed 
just to illustrate the method. 

2729-0 _ 
22.12/V10 
At df=n—1=9, the P-value is 2P(T, > 3.90) = 2(.002) = .004, Since this is smaller than any reasonable 
significance level, we reject Ho and conclude that the true average percent difference between meals’ stated 
energy values and their measured values is non-zero. 


For this sample, n = 10, X = 27.29%, and s = 22.12%, for a ¢ statistic of ¢ = 


The normal probability plot below indicates the data for good visibility does not come from a normal 
distribution. Thus, a /-test is not appropriate for this small a sample size. (The plot for poor visibility isn’s 
as bad.) That is, a pooled ¢ test should not be used here, nor should an “unpooled” two-sample f test be used 
(since it relies on the same normality assumption). 


We wish to test Hp: 44, = 44> versus Hy: fy, # Hp 
Unpooled: 
With Ao: 44 — My =0 v. Ay: 44 — My #0, we will reject Hp if p—value <a. 


(aig 
j9 1.S2° 
MG ike 7) ar 


8.48 — 9.36 —.88 


= ~ — = 15.95 115, and the test statistic ¢ = ———————. = ——— = -1.8] leads toa P- 
(2) (sz fF 1? 41522 4869 
Aig} N27) 14 12 
13 1] 
value of about 2P(7\; > 1.8) =2(.046) = .092. 
Pooled: 


The degrees of freedom are v = m+n—2=14+12—2=24 and the pooled variance 
ar 2 ( 2 
s( 53 (79) +{ 3 \o.soy = 1.3970, so s,, =1.181. The test statistic is 
24 24 : 
—88 —.88 


{ =—————— = —— = -1.89. The P-value = 2P(T>, > 1.9 ) = 2(.035) = .070. 
L18ljt+t .465 
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With the pooled method, there are more degrees of freedom, and the P-value is smaller than with the 
unpooled method. That is, if we are willing to assume equal variances (which might or might not be valid 
here), the pooled test is more capable of detecting a significant difference between the sample means. 


85. 

900 400 

— + — 
3n 


’ 


a. With 7 denoting the second sample size, the first is m = 3n. We then wish 20 = 2(2.58) 


which yields n = 47, m= 141. 


b. We wish to find the n which minimizes 22z,,., = Pui , or equivalently, the » which minimizes 
—n oR 


ra ee . Taking the derivative with respect to n and equating to 0 yields 
—n on 
900(400-n) *-400n"* =0, whence 9n° = 4(400 ~n) , or Sn’ +3200n—640,000 = 0. The solution 


is n = 160, and thus m = 400 — n = 240. 


87. We want to test the hypothesis Ho: 4 < 1.52 v. Hy: 4) > 1.5u2 — or, using the hint, Ho: 0<0v. H,: @>0. 


2 2 
Our point estimate of 0 is 0 = bs ~ 1.5X, , whose estimated standard error equals s(@) = FR (1.5)? a é 
ny n2 


2 2 
using the fact that /(@) = tee (1.5)? _ Plug in the values provided to get a test statistic t= 
1 2 


22.63 —1.5(14.15)—-0 


2.8975 
and .20 > .05, we fail to reject Hy at the 5% significance level. The data does not suggest that the average 
tip after an introduction is more than 50% greater than the average tip without introduction. 


~ 0.83. A conservative df estimate here is v= 50 — 1 = 49, Since P(T = 0.83) = .20 


200 14.142 vn re 
89, A. =0, 6, =0,=10,d=1, o=,/,—=—_——, =| 1.645-—~“__ |, giving # = .9015, .8264, 
0 Oo, =O, o a dn so B Tod givi gp 


.0294, and .0000 for n = 25, 100, 2500, and 10,000 respectively. If the xs referred to true average IQs 
resulting from two different conditions, 4,— 2, =1 would have little practical si gnificance, yet very large 
sample sizes would yield statistical significance in this situation. 


91. H, : p, = P, Will be rejected at level a in favor of H, : p, > p, ifz2 za. With P, = Sh =.10 and 


.0332 


Pp, = 2%, =.0668 , p=.0834 and z= wns” 4.2 , so Hp is rejected at any reasonable a level. It appears 


that a response is more likely for a white name than for a black name. 


133 


© 2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 


Chapter 9: Inferences Based on Two Samples 


93. 
a. Let “4, and yw, denote the true average weights for operations | and 2, respectively. The relevant 


hypotheses are H, : 4, — 4, =0 v. H,: 44-4, #0. The value of the test statistic is 


(1402.24 —1419.63) -17.39 -17.39 

t= = = - 6,43. 

(10.97)° (9.96) ¥4.011363+3.30672 7.318083 

a A a EE 

30 30 
7.318083) 
Atdf= v= Be eS = 57.5 ‘57, 2P(T<-6.43) = 0, so we can reject Hp at level 
(4.011363)" , (3.30672), 
29 29 

.05. The data indicates that there is a significant difference between the true mean weights of the 
packages for the two operations. 


b. Hp: 4, = 1400 will be tested against H,: 4, > 1400 using a one-sample ¢ test with test statistic 


t= iow With degrees of freedom = 29, we reject Hy if t> tos. 29 = 1.699. The test statistic value 
5,/Nm 


= fa esd = 225, =1.1. Because 1.1 < 1.699, Hy is not rejected. True average weight does 
10.97/30 2.00 


not appear to exceed 1400. 


ie ia [x y 
95. A large-sample confidence interval for 4; — 4) is (A, — 42) 26/2 A + a , or (x-y)tZ,) a ae : 
m n 


With x = 1.616 and } = 2.557, the 95% confidence interval for A, — 4) is —.94 + 1.96(.177) =—.94 + .35 = 
(—1.29, —59). 
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Section 10.1 


T : 
a = a = 2.44. Degrees of freedom are J — 1 = 4 and (J — 1) = 
(5)(3) = 15. From Table A.9, F,.,,; =3.06 and F,,4,; = 2.36 ; since our computed value of 2.44 is 


between those values, it can be said that .05 < P-value < .10. Therefore, Hp is not rejected at the a = .05 
level. The data do not provide statistically significant evidence of a difference in the mean tensile strengths 
of the different types of copper wires. 


1: The computed value of Fis f= 


3. With i = true average lumen output for brand i bulbs, we wish to test H, : 4, = 4, = 4, Vv. H,: at least two 


44's are different. MSTr = 6} = = 295.60, MSE=6; = = = 227.30, so 


_MSTr _ 295.60 _, 49 


~ MSE 227.30 
For finding the P-value, we need degrees of freedom / — 1 = 2 and J (/—1)= 21. In the 2™ row and 21* 
column of Table A.9, we see that 1.30< F,,,,, = 2.57, so the P-value > .10. Since .10 is not < .05, we 


cannot reject Hy. There are no statistically significant differences in the average lumen outputs among the 


three brands of bulbs. 
5. , = true mean modulus of elasticity for grade i (i= 1, 2,3). We test Hy : 4, = ky = 4 VS. Hi, at least two 
u,'s are different. Grand mean = 1.5367, 
MSTr = male 63~-1,5367) + (1.56 1.5367) +(1.42-1.5367)' | =.1143, 
l 2 2 2 MSTr_ .1143 
MSE =-—| (.27)' +(.24) +(.26)° | =.0660, f =——— = ——— = 1.73. At df = (2,27), 1.73 <2.51 = the 
3[ (27) +(24y' +(26)'] f= TISE .0660 ee) 


P-value is more than .10, Hence, we fail to reject Hp. The three grades do not appear to differ significantly. 


as Let 1, denote the true mean electrical resistivity for the ith mixture (i = | Barer <) 
The hypotheses are Ho: 4, = ... = #6 versus H,: at least two of the y's are different. 
There are / = 6 different mixtures and J = 26 measurements for each mixture. That information provides the 
df values in the table. Working backwards, SSE = (J — 1)MSE = 2089.350; SSTr = SST — SSE = 3575.065; 
MSTr = SSTr/(/— 1) = 715.013; and, finally, f= MSTr/MSE = 51.3. 


Source df SS MS 

Treatments 5 3575.065 715.013 51.3 
Error 150 2089.350 13.929 

Total 155 5664.415 


The P-value is P(Fs,:50 = 51.3) ~ 0, and so Hy will be rejected at any reasonable significance level. There is 
strong evidence that true mean electrical resistivity is not the same for all 6 mixtures. 
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The summary quantities are x, =34.3, x, =39.6, x, =33.0, x, =41.9, x, =148.8, LEx) = 946.68, so 


(148.8) (34.3)° +...+(41.9) 


CF = ac a 922.56 , SST = 946.68 — 922.56 = 24.12, SSTr= . —922.56 =8.98, 
SSE = 24.12-8.98 =15.14. 
LT a ES. ES: Ad 
Treatments 3 8.98 2.99 3.95 
Error 20 15.14 757 
Total 23 24.12 


Since 3.10 = Fy. 5 9 < 3.95 <4.94= F,, , 9, 01 < P-value < .05, and Hp is rejected at level .05. 


Section 10.2 


11. 


15. 


17. 


Qoss15 = 4.37, w= 4.37, ms = 36.09 . The brands seem to divide into two groups: 1, 3, and 4; and 2 


and 5; with no significant differences within each group but all between group differences are significant. 
3 l 2 2 5 
437.5 462.0 469.3 512.8 532.1 


Brand | does not differ significantly from 3 or 4, 2 does not differ significantly from 4 or 5, 3 does not 
differ significantly from1, 4 does not differ significantly from | or 2, 5 does not differ significantly from 2, 
but all other differences (e.g., | with 2 and 5, 2 with 3, etc.) do appear to be significant. 
3 I 4 2 5 
427.5 462.0 469.3 502.8 532.1 


In Exercise 10.7, / = 6 and J = 26, so the critical value is 0.056.150 ~ Q.0s,6.120 = 4.10, and MSE = 13.929. So, 


w4.10, ee = 3.00. So, sample means less than 3.00 apart will belong to the same underscored set. 


Three distinct groups emerge: the first mixture (in the above order), then mixtures 2-4, and finally mixtures 
5-6. 


14.18 17.94 18.00 18.00 25.74 27.67 


6 =Sc,p, where c, =c, =.5 and c, =-1,80 9 =.5x, +.5¥,, —¥, =-.527 and Ec? =1.50. With 
tors.27 = 2.052 and MSE = .0660, the desired Cl is (from (10.5)) 


(.0660)(1.50) 


—.527 +(2.052) = —.527 +.204 =(—.731,-.323). 
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140 _ 1680 
SSE/12 SSE 


W= Qoc 3195 a =3.77, See =.4867VSSE . Thus we wish = > 3.89 (significant f) and 


4867VSSE >10 (=20-— 10, the difference between the extreme ¥,,’s, so no significant differences are 
identified). These become 431.88 > SSE and SSE > 422.16, so SSE = 425 will work. 


19, MSTr = 140, error df= 12, so f= and F,,5,. =3.89. 


21. 
a. The hypotheses are Ho: 44; =... = He V. H,: at least two of the y;’s are different. Grand mean = 222.167, 
MSTr = 38,015.1333, MSE = 1,681.8333, and f= 22.6. 
At df= (5, 78) = (5, 60), 22.6 > 4.76 = P-value < .001. Hence, we reject Ho. The data indicate there is 
a dependence on injection regimen. 


b. Assume top. 95 * 2.645. 


MSE(Ec’ ) 


i) Confidence interval for 4, — +(u, + My + Hy + Ms + He ) 2 LEX, Ft. s-1) y 


1,681.8333(1.2 
‘ -6144(2,645) |S 3302) = (-99.16,-35.64). 


ii) Confidence interval for 4( 2, + 4 + My + Hs )- Mg: 


1,681.8333(1.25) 
4 


= 61.75+ (2.645) 


= (29.34,94.16) 


Section 10.3 


23. J, =5,J,=4,J3=4,J4=5, %, = 58.28, ¥, = 55.40, %, =50.85, x, = 45.50, MSE = 8.89. 


MSE;| 1. 1 8.89; 1 1 
With W, =Qos.aia° MSE 1.44) =4.11 (4d), 


¥, —¥, +W,, =(2.88)+(5.81);%, —¥, +My =(7.43)+(5.81) *; %.-¥, +My = (12.78) +(5.48) *: 
¥, —¥, + Wy, = (4.55)+(6.13); % —¥ +My = (9.90) + (5.81) *;¥%, —X, +My = (5.35)+(5.81). 


A * indicates an interval that doesn’t include zero, corresponding to y’s that are judged significantly 
different. This underscoring pattern does not have a very straightforward interpretation. 


a 3 2 1 


25. 
a. The distributions of the polyunsaturated fat percentages for each of the four regimens must be normal 
with equal variances. 


b. We have all the X,s , and we need the grand mean: 
8(43.0)+13(42.4)+ 17(43.1)+14(43.5) _ 2236.9 
52 he 


= 43.017; 
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2 


SSTr =) J, (¥, —¥ J =8(43.0- 43.017) +13(42.4—43.017) 


coll 2.778 


+17(43.1- 43.017) +13(43.5—43.017)° = 8.334 and MSTr = 


; 2 2 2 2 19 
SSE = )'(J, -1)s? =7(1.5)’ +12(1.3)° +16(1.2)° +13(1.2) =77.79 and MSE = 77" = 1.621. Then 
_MSTr _ 2.778 _ | 14 
MSE 1.621 


Since 1.714 < F449 3.59 = 2.20 , we can say that the P-value is > .10. We do not reject the null 


hypothesis at significance level .10 (or any smaller), so we conclude that the data suggests no 
difference in the percentages for the different regimens. 


Let ; = true average folacin content for specimens of brand i. The hypotheses to be tested are 
Hy: My = 144 = 4, = HW, VS. H,: at least two of the y;’s are different . rx; = 1246.88 and 


x? (168.4) Ex? _ (57.9) (37.5)' | (38.1)' |, (349) 


4 
2 


—= = 1181.61, so SST = 65.27; —* =-~——— + ~~ + +—__+~_ + + = 1205.10 , so 
n 24 J, 7 5 6 6 
SSTr = 1205.10 —1181.61 = 23.49. 

Source df Ss MS 

Treatments 3 23.49 7.83 3.75 

Error 20 41.78 2.09 

Total 23 65.27 


With numerator df = 3 and denominator df = 20, F,, , 2) =3.10 <3.75 < Fy) 5 5 = 4.94, so the P-value 


is between .01 and .0S. We reject Hp at the .05 level: at least one of the pairs of brands of green tea has 
different average folacin content. 


With ¥, =8.27, 7.50, 6.35, and 5.82 for i = 1, 2, 3, 4, we calculate the residuals x, —%,, for all 


observations. A normal probability plot appears below and indicates that the distribution of residuals 
could be normal, so the normality assumption is plausible. The sample standard deviations are 1.463, 
1.681, 1.060, and 1.551, so the equal variance assumption is plausible (since the largest sd is less than 
twice the smallest sd). 


Normal Probability Plot for ANOVA Residuals 
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C.  Dos.4.29 = 3-96 and W, =3.96- 20(444) , so the Modified Tukey intervals are: 
Bey 
Pair Interval Pair Interval 
1,2 Bi yy ht dy ae aes 1.15+2.45a 
1,3 1.92+2.25 2.4 1.68 + 2.45 
1,4 2.45+ 2.25 * 3,4 5342.34 


Only Brands | and 4 are significantly different from each other. 


29, E(SSTr) = E(2J,X; ~nX?) =2J,£(X?)-nE(X?) 
== [r(%,)+(e(%)) nf r(® )o(E(%) | -54| f+ uf o(2A) | 


=Io° +E), (uta) -o” -<[3),(u+a, i =(I-1)o? + Dy? +2p8J a, +2,a/ -+ [nus Of 


=(I-1)o? +4?n+2y0+2J,a) -ng® =(1-l)o* + ¥J,a@}, from which E(MSTr) is obtained through 
division by (/— 1). 


31. With o = | (any other o would yield the same ¢), a, =—l, @, =a, =0, @, =1, 


ie 1(5(-1)' +4(0) +4(0) +5(1)') 


7; =2.5, $=1.58, v, =3, v, =14, and power ~ .65. 


33, _g(x)= (1 -*) =nu(1-u) where u =~, so A(x) = [[u(1-u)]"" du. From a table of integrals, this 
n n 


gives A(x) = aresin( Vu ) = arcsin (E ] as the appropriate transformation. 
n 
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Supplementary Exercises 


35. 


37. 


39. 


41. 


a. The hypotheses are H, : 4, = 4, = 4, =, Vv. H,: at least two of the yz;’s are different. The calculated 


test statistic is f= 3.68. Since Fos 3.29 = 3.10 < 3.68 < Fo;,3,20 = 4.94, the P-value is between .01 and 
.05. Thus, we fail to reject Hp ata =.01. At the 1% level, the means do not appear to differ 


significantly. 


b. We reject Hy when the P-value < a. Since .029 is not < .01, we still fail to reject Ho. 


Let yz; = true average amount of motor vibration for each of five bearing brands. Then the hypotheses are 


H, : Mt, =...= Ms VS. H,: at least two of the u;’s are different. The ANOVA table follows: 
Source df SS MS F 
Treatments 4 30.855 7.714 8.44 
Error 25 22.838 0.914 
Total 29 53.694 


8.44 > F 091,425 = 6.49, so P-value < .001 < .05, so we reject Hy. At least two of the means differ from one 
another. The Tukey multiple comparisons are appropriate. Q,, ,,, = 4.15 from Minitab output; or, using 


Table A.10, we can approximate with Q,,,.,= 4.17. W, =4.15v.914/6 =1.620. 


Pair GX: 


1,2 -2.267* ; 

1,3 0.016 2,5 2.867* 
1,4 -1.050 3,4 -1.066 
1,5 0.600 3,5 0.584 
2,3 2.283* 45 1.650* 


*Indicates significant pairs. 


3 3 1 4 2 


BS as ed | 4 f 
6= 258--eee 165, to555 = 2.060 , MSE = .108, and 
t= (1) +(-.25)° +(-.25) +(-.25) +(-.25) = 1.25 , soa 95% confidence interval for @ is 


(.108)(1.25) _ 


.165 + 2.060 = .165+.309 =(-.144,.474). This interval does include zero, so 0 is a plausible 


value for 6. 


This is a random effects situation. H, : 07; =0 states that variation in laboratories doesn’t contribute to 
variation in percentage. SST = 86,078.9897 — 86,077.2224 = 1.7673, SSTr = 1.0559, and SSE =.7114. 

At df = (3, 8), 2.92 < 3.96 < 4,07 = .05 < P-value < .10, so Hp cannot be rejected at level .05. Variation in 
laboratories does not appear to be present. 
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43. (1-1)(MSE)(Fos,7-i0-1) = ¥(2)(2-39) (3.63) = 4.166. For 4, —4,,¢1= 1, €2 = —1, and c; = 0, so 


re = f+t=s7 Similarly, for 4, - ce f+4 =.540; for - 
TJ 8 5 . . y; My Ay, ri) 8 > ie > Hy Lys | 


eo ht 3 os 5? (-l) | 
— = /—-+— =,.606 , and for .5y,+.54, -—4,, + =,/—+—+ = 498. | 
Ls 5" Hatta Be the Ne gg | 


Contrast Estimate Interval | 
ot 25.59 — 26.92 = -1.33 (—1.33)+(.570)(4.166) =(-3.70,1.04) 
Wy 25.59 — 28.17 =-2.58 (2.58) +(.540)(4.166) =(—4.83, -.33) 

H, — Bs 26.92 — 28.17 =-1.25 (—1.25) + (.606)(4.166) = (-3.77,1.27) 
SH; +5 Hy — Hs -1.92 (—1.92)+(.498)(4.166) = (—3.99,0.15) 


The contrast between yz, and 44, , since the calculated interval is the only one that does not contain 0. 


45. Y, -¥ =c(X,-X,) and ¥, -¥. =c(X, -X_), so each sum of squares involving Y will be the 


corresponding sum of squares involving X multiplied by c. Since F is a ratio of two sums of squares, ro 
appears in both the numerator and denominator. So c? cancels, and F computed from Y's = F computed 
from X;;’s. 


141 


© 2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 


CHAPTER 11 


Section 11.1 


1. 
ae Sot 2 eT SE SSA/(/-1 442.0/ (4-1 
a. The test statisticis f{, =——— =—— = — \ = 7.16. Compare this to the 
MSE SSE/(J-1)((J-1)  123.4/(4-)DG-D 
F distribution with df = (4 —1. (4 — 1)(3 — 1)) = G, 6): 4.76 < 7.16 < 9.78 => .O1 < P-value < .05. In 
particular, we reject Ho, at the .05 level and conclude that at least one of the factor A means is 
different (equivalently, at least one of the a,’s is not zero). 
Li. ; SSB/(J-1 428.6/ (3-1 : - 
b. Similarly, f, = po os ) 10,42. At df = (2, 6), 5.14 < 10.42 < 10.92 
SSE/(J-1)(J-1)  123.4/(4-DG-1) 
=> .01 < P-value < .05. In particular, we reject Hog at the .05 level and conclude that at least one of the 
factor B means is different (equivalently, at least one of the /;’s is not zero). 
sf 
a. The entries of this ANOVA table were produced with software. 
Source df SS ___MS F 
Medium l 0.053220 0.0532195 18.77 
Current 3 0.179441 0,0598135 21.10 
Error 3 0.008505 0.0028350 
Total 7 0.241165 


To test Hy; a, = a2 = 0 (no liquid medium effect), the test statistic is fj; = 18.77; at df = (1, 3), the P- 
value is .023 from software (or between .01 and .05 from Table A.9), Hence, we reject Ho, and 
conclude that medium (oil or water) affects mean material removal rate. 


To test Hog: 2, = B2 = Bs = Bs = 0 (no current effect), the test statistic is fy = 21.10; at df = (3, 3), the P- 
value is .016 from software (or between .01 and .05 from Table A.9). Hence, we reject Hog and 


conclude that working current affects mean material removal rate as well. 


b. Using a .05 significance level, with J = 4 and error df = 3 we require 005,43 = 6.825. Then, the metric 


for significant differences is w= 6.825V0.0028530/2 = 0.257. The means happen to increase with 
current; sample means and the underscore scheme appear below. 


Current: 10 15 20 25 
X.: 0.201 0.324 0.462 0.602 
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Ss 
Source df SS MS F 
Angle 3 58.16 19.3867 2.5565 
Connector 4 246.97 61.7425 8.1419 
Error 12 91.00 7.5833 
Total 19 396.13 

We're interested in H, :@, = @, = a, = @, = 0 versus H,: at least one a; # 0. f, =2.5565 < Foy 3, =5.95 > 

P-value > .01, so we fail to reject Hp. The data fails to indicate any effect due to the angle of pull, at the .01 

significance level. 

7. 
a. The entries of this ANOVA table were produced with software. 
Source df SS MS F 
Brand 2 22.8889 11.4444 8.96 
Operator 2 27.5556 13.7778 10.78 
Error 4 5.1111 1.2778 
Total 8 55.5556 
The calculated test statistic for the F-test on brand is fy = 8.96. At df= (2, 4), the P-value is .033 from 
software (or between .01 and .05 from Table A.9). Hence, we reject Hy at the .05 level and conclude 
that lathe brand has a statistically significant effect on the percent of acceptable product. 

b. The block-effect test statistic is f= 10.78, which is quite large (a P-value of .024 at df = (2, 4)). So, 
yes, including this operator blocking variable was a good idea, because there is significant variation 
due to different operators. If we had not controlled for such variation, it might have affected the 
analysis and conclusions. 

9. The entries of this ANOVA table were produced with software. 
Source df SS MS F 
Treatment 3 81.1944 27.0648 22.36 
Block 8 66.5000 8.3125 6.87 
Error 24 29.0556 1.2106 
Total 35 176.7500 


At df = (3, 24), f= 22.36 > 7.55 => P-value < .001. Therefore, we strongly reject Hp, and conclude that 
there is an effect due to treatments. We follow up with Tukey’s procedure: 


Q 5.4.24 = 3.90; w= 3.90V1.2106/9 = 1.43 
l a 3 Z 


8.56 9.22 10.78 12.44 
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11. The residual, percentile pairs are (-0.1225, -1.73), (0.0992, —1.15), (-0.0825, -0.81), (-0.0758, —0.55), 
(-0.0750, —0.32), (0.0117, —0.10), (0.0283, 0.10), (0.0350, 0.32), (0.0642, 0.55), (0.0708, 0.81), 
(0.0875, 1.15), (0.1575, 1.73). 


Normal Probability Plot 


0.1 
. 
s 7 
2 
E} ey 
3 oo ’ 
2 
. ee 
01 . 
” 
T = T T T 
2 1 0 1 2 
z-percentile 


The pattern is sufficiently linear, so normality is plausible. 


5 
a. With ¥,=X,+d, ¥,=X,+d and Y,=X,+d and ¥ =X +d, so all quantities inside the 
parentheses in (11.5) remain unchanged when the Y quantities are substituted for the corresponding X°s 


(e.g, ¥ -¥ =X, -X , etc.). 


b. With ¥, =cX, , each sum of squares for Y is the corresponding SS for X multiplied by c’. However, 


when F ratios are formed the c’ factors cancel, so all F ratios computed from Y are identical to those 
computed from X. If ¥, =cX, +d , the conclusions reached from using the Y's will be identical to 
those reached using the X’s. 


> > (24) ; ; 
a. La; =24, so 9 -(3|3)- 1.125, ¢=1.06, v, = 3, v.=6, and from Figure 10.5, power * .2. 


For the second alternative, ¢= 1.59, and power ~ .43. 


5 )\ 16 


: ; 20° 
b ¢ -(4\n4 -($\[2)=1.00, s ¢=1.00, v, =4, v) = 12, and power ~.3. 
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Section 11.2 


17. 
a. 
Source df SS MS 
Sand 2 705 352.5 065 
Fiber 2 1,278 639.0 6.82 016 
Sand = Fiber 4 279 69.75 0.74 585 
Error 9 843 93.67 
Total 17 3,105 
P-values were obtained from software; approximations can also be acquired using Table A.9. There 
appears to be an effect due to carbon fiber addition, but not due to any other effect (interaction effect or 
sand addition main effect). 
b. 
Source df P-value 
Sand 2 106.78 53.39 6.54 018 
Fiber 2 87.11 43.56 5.33 .030 
Sand * Fiber 4 8.89 2.22 0.27 889 
Error 9 73.50 8.17 
Total 17 276.28 
There appears to be an effect due to both sand and carbon fiber addition to casting hardness, but no 
interaction effect. 
Ci 


Sand% 0 15 30 0 15 30 0 15.. 30 
Fiber%™ 0 0 0 025 0.25 0.25 05 O05 05 
x 62 68 69.5 69 715 73 68 71.5 74 


The plot below indicates some effect due to sand and fiber addition with no significant interaction. 
This agrees with the statistical analysis in part b. 


Interaction Plot (data means) for Hardness 


Carbon 

—e 0.00 
—_ 0.25 
> 0,50 


145 


© 2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part 


Chapter 11: Multifactor Analysis of Variance 


19. 
Source df SS MS F 
Farm Type 2 35:75 17.875 0.94 
Tractor Maint. Method 5 861.20 172.240 9.07 
Type x Method 10 603.51 60.351 3.18 
Error 18 341.82 18.990 
Total 35 1842.28 
For the interaction effect, f4z = 3.18 at df = (10, 18) gives P-value = .016 from software. Hence, we do not 
reject Hoy, at the .01 level (although just barely). This allows us to proceed to the main effects. 
For the factor A main effect, f, = 0.94 at df= (2, 18) gives P-value = .41 from software. Hence, we clearly 
fail to reject Ho, at the .01 level — there is not statistically significant effect due to type of farm. 
Finally, fy = 9.07 at df = (5, 18) gives P-value < .0002 from software. Hence, we strongly reject Hog at the 
01 level — there is a statistically significant effect due to tractor maintenance method. 
21. From the provided SS, SSAB = 64,954.70 — (22,941 80 + 22,765.53 + 15,253.50] = 3993.87. This allows us 


to complete the ANOVA table below. 


Source df SS MS F 
A 2 22,941.80 11,470.90 22.98 
B 4 22,765.53 5691.38 11.40 
AB 8 3993.87 499,23 49 
Error 15 15,253.50 1016.90 
Total 29 64,954.70 


fay = 49 is clearly not significant. Since 22.98 2 Fos > 3 = 4.46, the P-value for factor A is < .05 and Ho, is 
rejected. Since 11.40 > Fos 43 = 3.84, the P-value for factor B is < .05 and Hog is also rejected. We 


conclude that the different cement factors affect flexural strength differently and that batch variability 
contributes to variation in flexural strength. 


23. Summary quantities include x, =9410, x, =8835, x, =9234, x, =5432, x, =5684, x, =5619, 
x, =5567,x, =5177,x_ =27,479, CF =16,779,898.69 , Ex; = 251,872,081, Ex", =151,180,459, 


resulting in the accompanying ANOVA table. 


Source df SS MS F 
A 2 11,573.38 5786.69 st = 26.70 
B 4 17,930.09 4482.52 MSE. = 20,68 
AB 8 1734.17 216.77 MSAD = 1.38 
Error 30 4716.67 157.22 
Total 44 35,954.31 


Since 1.38 < Fy) 39 = 3-17, the interaction P-value is > .01 and Hog cannot be rejected. We continue: 


26.70 > F124 =8.65 => factor A P-value < .01 and 20.68 > F'o,4. = 7.01 => factor B P-value < .01, so 
both Ho, and Hop are rejected. Both capping material and the different batches affect compressive strength 


of concrete cylinders. 
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25. With 0 =a, -@;, 6=X, -X,.= — 23 (X — X ix ), and since i #i', Xj, and X;, are independent 
, m - & 2 2 Ic? Lag 
for every j, k. Thus, V(@)=V(X, )+V (x, -2 42 ~°° (because V(X, )=V(E,) and 
V (Em ) =0°)80 6; = a . The appropriate number of df is /(K — 1), so the CI is 


JK 
ae Puse - For the data of exercise 19, ¥,, = 8.192, %,, = 8.395, MSE = .0170, 


toas9 = 2.262, J =3, K=2, 80 the 95% Cl. for a3 — a is (8.182 ~ 8.395) + 2.262 = ~0.203 + 0.170 


= (-0.373, —0.033). 


Section 11.3 


27. 
a. The last column will be used in part b. 

Source df SS MS F F os.nem df, den df 
A 2 14,144.44 7072.22 61.06 3.35 
B 2 5,511.27 2755.64 23.79 3.35 
Cc 2 244,696.39 122.348.20 1056.24 3.35 
AB 4 1,069.62 267.41 2.31 2.73 
AC 4 62.67 15.67 14 2.73 
BC 4 331.67 82.92 72 2.73 
ABC 8 1,080.77 135.10 1.17 2.31 

Error 27 3,127.50 115.83 

Total 53 270,024.33 


b. The computed F-statistics for all four interaction terms (2.31, .14, .72, 1.17) are less than the tabled 
values for statistical significance at the level .05 (2.73 for AB/AC/BC, 2.31 for ABC). Hence, all four 
P-values exceed .05. This indicates that none of the interactions are statistically significant. 


c. The computed F-statistics for all three main effects (61.06, 23.79, 1056.24) exceed the tabled value for 
significance at level .05 (3.35 = F os2.27). Hence, all three P-values are less than .05 (in fact, all three 
P-values are less than .001), which indicates that all three main effects are statistically significant. 


5 i 
d. Since Qos 477 is not tabled, use Qos 3,24 =3.53, w= 3.53 aaa) = 8.95. All three levels differ 


significantly from each other. 
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29. 
a. 
Source df SS MS F P-value 
A 2 1043.27 521.64 110.69 <.001 
B l 112148.10 112148.10 23798.01 <.001 
C 2 3020.97 1510.49 320.53 <.001 
AB 2 373.52 186.76 39.63 <.001 
AC o 392.71 98.18 20.83 <.001 
BC 2 145.95 72.98 15.49 <.001 
ABC as 54.13 13.53 2.87 029 
Error 72 339.30 4.71 
Total 89 117517.95 
P-values were obtained using software. At the .01 significance level, all main and two-way interaction 
effects are statistically significant (in fact, extremely so), but the three-way interaction is not 
statistically significant (.029 > .01), 

b. The means provided allow us to construct an AB interaction plot and an AC interaction plot. Based on 
the first plot, it’s actually surprising that the AB interaction effect is significant: the “bends” of the two 
paths (B = |, B = 2) are different but not that different. The AC interaction effect is more clear: the 
effect of C = 1 on mean response decreases with A (= 1, 2, 3), while the pattern for C = 2 and C =3 is 
very different (a sharp up-down-up trend). 

Interaction Plot Interaction Plot 
Duta Means Dats Means 
1404 o a bs c 
1304 a. tt Eee . Le] 3 SS =e =: 
: ee 
gs = \00 ; 
= 3 * 
a aie — en ¢ sai . 
3 ae tN eee Le A el, ih Se te re Pe dd Re 
31. 
a. The following ANOVA table was created with software. 
Source df SS 3 MS F P-value 
A 2 124.60 62.30 4.85 .042 
B 2 20.61 10.30 0.80 A81 
C 2 356.95 178.47 13.89 .002 
AB 4 57.49 14.37 1.12 412 
AC 4 61.39 15.35 1.19 383 
BC 4 11.06 2.76 0.22 .923 
Error 8 102.78 12.85 
Total 26 734.87 


b. The P-values for the AB, AC, and BC interaction effects are provided in the table. All of them are much 
greater than .1, so none of the interaction terms are statistically significant. 


148 


© 2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 


Chapter 11: Multifactor Analysis of Variance 


c. According to the P-values, the factor A and C main effects are statistically significant at the .05 level. 
The factor B main effect is not statistically significant. 


d. The paste thickness (factor C) means are 38.356, 35.183, and 29,560 for thickness .2, .3, and .4, 
respectively. Applying Tukey’s method, Qo5,3,3 = 4.04 => w= 4.04V12.85/9 = 4.83. 


Thickness: 4 3 2 
Mean: 29.560 35.183 38.356 
33. The various sums of squares yield the accompanying ANOVA table. 

Source df SS MS F 

A 6 67.32 11.02 

B 6 51.06 8.51 
C 6 5.43 91 61 

Error 30 44.26 1.48 

Total 48 168.07 


We’re interested in factor C. At df = (6, 30), .61 < Fos.¢39 = 2.42 => P-value > .05. Thus, we fail to reject 
Hoc and conclude that heat treatment had no effect on aging. 


35. 
1 2 3 4 5 
x, 4068 3004 4402 32.14 33.21 xx? = 6630.91 
x, 29.19 31.61 37.31 40.16 41.82 3x7, = 6605.02 


x, 3659 36.67 36.03 3450 36.30 Ex7, = 6489.92 
x =180.09 , CF = 1297.30, ZExjq) = 1358.60 


Source df SS MS F 
A 4 28.89 7.22 10.71 
B 4 23.71 5.93 8.79 
Cc 4 0.63 0.16 0.23 
Error 12 8.09 0.67 
Total 24 61.30 


F 0s,4,12 = 3-26, so the P-values for factor A and B effects are < .05 (10.71 > 3.26, 8.79 > 3.26), but the P- 
value for the factor C effect is > .05 (0.23 < 3.26). Both factor A (plant) and B(leaf size) appear to affect 
moisture content, but factor C (time of weighing) does not. 
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SST = (71)(93.621) = 6,647.091. Computing all other sums of squares and adding them up = 6,645.702. 
Thus SSABCD = 6,647.091 — 6,645.702 = 1.389 and MSABCD = 1.389/4 = .347. 


Source F 61,num df, den df” 


2 2207.329 2259.29 5.39 
] 47.255 48.37 7.56 
2 491.783 503.36 5.39 
| 044 05 7.56 
2 15.303 15.66 5.39 
4 275.446 281.93 4.02 
2 470 48 5.39 
2 2.141 2.19 5.39 
1 273 28 7.56 
2 247 25 5.39 
4 3.714 3.80 4.02 
2 4.072 4.17 5.39 
4 767 79 4.02 
2 280 29 5.39 
4 347 355 4.02 
Error 36 977 


Total 71 
*Because denominator df for 36 is not tabled, use df = 30. 


To be significant at the .01 level (P-value < .01), the calculated F statistic must be greater than the .01 
critical value in the far right column. At level .01 the statistically significant main effects are A, B,C. The 
interaction AB and AC are also statistically significant. No other interactions are statistically significant. 


Section 11.4 


39. 


Start by applying Yates’ method. Each sum of squares is given by SS = (effect contrast)’/24. 


Total Effect 

Condition Xigk l 2 Contrast SS 
(1) 315 927 2478 5485 

a 612 1551 3007 1307 SSA=71,177.04 

b 584 1163 680 1305 SSB = 70,959.38 

ab 967 1844 627 199 SSAB = 1650.04 

c 453 297 624 529 SSC = 11,660.04 

ac 710 383 681 -53 SSAC = 117.04 

be 737 257 86 57 SSBC = 135.38 

abe 1107 370 113 27 SSABC = 30.38 


a. Totals appear above. From these, 
584 +967 + 737 +1107 -315—612—453-710 


A oS Seok ee = 54,38; 

B,=%; 24 

; 15- 4—967-—4 10- 07 P AAC 

jf = SOD S84 967 A894 70-T97 1107 9p; para pe wnat 
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b. Factor sums of squares appear in the preceding table. From the original data, YYYY x,,, = 1,411,889 
and x.... = 5485, so SST = 1,411,889 — 54857/24 = 158,337.96, from which SSE = 2608.7 (the 


remainder). 

Source df Ss MS F P-value 
A 1 71,177.04 71,177.04 435.65 <.001 
B 1 70,959.38 70,959.38 435.22 <.001 
AB 1 1650.04 1650.04 10.12 .006 
8 1 11,660.04 11,660.04 71.52 <.001 
AC 1 117.04 117.04 0.72 A09 
BC 1 135.38 135.38 0.83 376 
ABC 1 30.38 30.38 0.19 .672 

Error 16 2608.7 163.04 

Total 23 158,337.96 


P-values were obtained from software. Alternatively, a P-value less than .05 requires an F statistic 
greater than F 95,15 = 4.49. We see that the AB interaction and all the main effects are significant. 


c. Yates’ algorithm generates the 15 effect SS’s in the ANOVA table; each SS is (effect contrast)’/48. 
From the original data, YYYYY Xun = 3,308,143 and x..... = 11,956 => SST = 3,308,143 - 1 1,9567/48 


328,607.98. SSE is the remainder: SSE = SST — [sum of effect SS’s] =... = 4,339.33. 
Source df ss MS F 
A l 136,640.02 136,640.02 1007.6 
B l 139,644.19 139,644.19 1029.8 
C 1 24,616.02 24,616.02 181.5 
D l 20,377.52 20,377.52 150.3 
AB 1 2,173.52 2,173.52 16.0 
AC 1 2:52 2.52 0.0 
AD 1 58.52 58.52 0.4 
BC 1 165.02 165.02 1.2 
BD 1 9.19 9.19 0.1 
CD ] 17.52 17.52 0.1 
ABC l 42.19 42.19 0.3 
ABD 1 117.19 117.19 0.9 
ACD 1 188.02 188.02 1.4 
BCD 1 13.02 13.02 0.1 
ABCD 1 204.19 204.19 1.5 
Error 32 4,339.33 135.60 
Total 47 328,607.98 


In this case, a P-value less than .05 requires an F statistic greater than F 9s,\,32 ~ 4.15. Thus, all four 
main effects and the AB interaction effect are statistically significant at the .05 level (and no other 
effects are). 


151 


© 2016 Cengage Learning, All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 


Chapter 11: Multifactor Analysis of Variance 


41. The accompanying ANOVA table was created using software. All F statistics are quite large (some 
extremely so) and all P-values are very small. So, in fact, all seven effects are statistically significant for 


predicting quality. 


Source df SS MS F P-value 
A ] .003906 .003906 25.00 001 
B | 242556 242556 =1552.36 <,001 
Cc l .003906 .003906 25.00 .001 
AB l .178506 .178506 1142.44 <.001 
AC l .002256 .002256 14.44 .005 
BC 1 178506 178506 =: 1142.44 <.001 
ABC ] 002256 .002256 14.44 005 
Error 8 .000156 .000156 
Total 15 .613144 
43. 
Condition/ _ (contrast)? , | Condition/ _ (contrast ¥ 
Effect SS = TT . Effect eerie. f 
(1) — D 414.123 850.77 
A 436 <1 AD .017 aah | 
B .099 <1 BD 456 <1 
AB .003 <] ABD .990 — 
C 109 <1 CD 2.190 4.50 
AC .078 <] ACD 1.020 — 
BC 1.404 3.62 BCD 133 —- 
ABC 286 ABCD 004 — 


SSE = .286 + .990 + 1.020 + .133 + .004 =2.433, df= 5, so MSE = .487, which forms the denominators of the F 
values above. A P-value less than .05 requires an F statistic greater than Fg; \5 = 6.61, so only the D main effect 


is significant. 


45. 
a. The allocation of treatments to blocks is as given in the answer section (see back of book), with block 
#1 containing all treatments having an even number of letters in common with both ab and cd, block 
#2 those having an odd number in common with ab and an even number with ed, etc. 


; 5,898" 
DB. LIED 1OE%5- 


=9,035,054andx  =16,898,s0 SST =9,035,054- =111,853.875. The eight 


block-replication totals are 2091 (= 618 + 421 + 603 + 449, the sum of the four observations in block 
#1 on replication #1), 2092, 2133, 2145, 2113, 2080, 2122, and 2122, so 
2091 2122? _ 16,898" 

ae 7 39 
algorithm; those we keep appear below. SSE is computed by SST — {sum of all other SS]. MSE = 
$475.75/12 = 456.3125, which forms the denominator of the F ratios below. With Fo) ;;. =9.33, only 


the A and B main effects are significant. 


SSBI = 


= 898.875. The effect SS’s can be computed via Yates’ 
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Chapter 11: Multifactor Analysis of Variance 


Source df SS F 
A ] 12403.125 27.18 
B 1 92235.125 202.13 
Cc ] 3.125 0.01 
D 1 60.500 0.13 

AC l 10.125 0.02 
BC ] 91.125 0.20 
AD l 50.000 0.11 
BC | 420.500 0.92 
ABC l 3.125 0.01 
ABD 1 0.500 0.00 
ACD | 200.000 0.44 
BCD l 2.000 0.00 
Block 7 898.875 0.28 
Error 12 5475.750 
Total 31 111853.875 


The third nonestimable effect is (ABCDE)(CDEFG) = ABFG. The treatments in the group containing (1) 
are (1), ab, cd, ce, de, fz, acf, adf, adg, aef, acg, aeg, beg, bef, bdf, bdg, bef, beg, abcd, abce, abde, abfg, 
cdfz, cefg, defg, acdef, acdeg, bedef, bedeg, abcdfg, abcefg, abdefg. The alias groups of the seven main 
effects are{A, BCDE, ACDEFG, BFG}, {B, ACDE, BCDEFG, AFG}, {C, ABDE, DEFG, ABCFG}, 
{D, ABCE, CEFG, ABDFG}, {E, ABCD, CDFG, ABEFG}, {F, ABCDEF, CDEG, ABG}, and 

{G, ABCDEG, CDEF, ABF}. 


1: (1), aef, beg, abed, abfg, cdfg, acdeg, bedef: 2: ab, cd, fg, aeg, bef, acdef, bcdeg, abcdfg; 3: de, acg, adf, 
bef, bdg, abce, cefg, abdefg; 4: ce, acf, adg, beg, bd, abde, defg, abcefg. 


a 70.4 
b 72.1 
c 70.4 
abo =: 73.8 
d 67.4 
abd = 67.0 
acd = 66.6 
bed 66.8 
e 68.0 
abe 67.8 
ace 67.5 


bce 70.3 
ade 64.0 
bde 7.9 
cde 65.9 
abede 68.0 
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Chapter 11: Multifactor Analysis of Variance 


(70.4-72.1-70.4+...+ 68.0)" 
16 

10.240. SSAB = 1.563, SSAC = 7.563, SSAD = .090, SSAE = 4.203, SSBC = 2.103, SSBD = .010, SSBE 

= 123, SSCD = .010, SSCE = .063, SSDE = 4.840, Error SS = sum of two factor SS’s = 20.568, Error MS 

= 2.057, Fo.) 19 =10.04, so only the D main effect is significant. 


Thus SSA = = 2.250, SSB = 7.840, SSC = .360, SSD = 52.563, SSE = 


Supplementary Exercises 


Source Pb phy erated MS Lei 

A l 322.667 322.667 980.38 

B 3 35.623 11.874 36.08 

AB 3 8.557 2.852 8.67 k 
Error 16 5.266 329 
Total 23 372.113 


We first test the null hypothesis of no interactions ( H,,, : 7, =0 forall i, /). At df = (3, 16), 5.29 < 8.67 < 


9.01 = .01 < P-value < .001. Therefore, Hy is rejected. Because we have concluded that interaction is 
present, tests for main effects are not appropriate. . 
) 
53. Let A = spray volume, B = belt speed, C = brand. The Yates table and ANOVA table are below. At degrees . 
of freedom = (1, 8), a P-value less than .05 requires F > F'9s,).x = 5.32. So all of the main effects are 
significant at level .05, but none of the interactions are significant. : 
Condition Total I 2 Contrast SS = {contrast} | 
(1) 76 iy * "7.989 592 ~ 21,904.00 
A 53 160 303 22 30.25 
B 62 143 13 48 144.00 
AB 98 160 9 134 1122.25 ' 
re 88 23 31 14 12.25 
AC 55 36 17 4 1.00 
BC 59 -33 59 14 12.25 
ABC 101 42 75 16 16.00 
Effect cd TAPES) | F 
A l 30.25 6.72 
B I 144.00 32.00 
AB l 1122.25 249.39 
Cc l 12.25 2.72 
AC l 1.00 .22 
BC | 12.25 2.72 
ABC I 16.00 3.56 
Error 8 4.50 
Total 15 
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a. 
Effect 
Effect Y%lron 1 2 3 Contrast SS 
7 18 37 174 684 
A 11 19 137 510 144 1296 
B 7 62 169 50 36 81 
AB 12 75 341 94 0 0 
C 21 79 9 14 272 4624 
AC 41 90 4] 22 32 64 
BC 27 165 47 2 12 9 
ABC 48 176 47 2 4 1 
D 28 = 1 100 336 7056 
AD 51 5 13 172 Aa 121 
BD 33 20 11 32 8 4 
ABD 57 21 11 0 0 0 
CD 70 23 1 12 72 324 
ACD 95 24 1 0 —32 64 
BCD 77 25 1 0 -12 9 
ABCD 99 22 3 4 4 l 
We use estimate = contrast/2’ when n= | to get @, = + = =9.00, B. = ~ = 225 
6, = = =17.00, 7, = “ =21.00. Similarly, (« =0, (as) =2.00, (ar) =2.75, 
i nt i 


[45] =75, [Ar] =.50, and @ = 4.50. 


b. The plot suggests main effects A, C, and D are quite important, and perhaps the interaction CD as well. 
In fact, pooling the 4 three-factor interaction SS’s and the four-factor interaction SS to obtain an SSE 
based on 5 df and then constructing an ANOVA table suggests that these are the most important 


effects. 


*o 
20 —| 
ec 
*co 
a aid 
o- bed ene 
T og mm rT ss 
2 1 0 1 2 
z-percentile 
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Chapter 11: Multifactor Analysis of Variance 


57. The ANOVA table is: 


Source df SS MS F | Fj, nem df, dent 
A 2 67553 33777 11.37 5.49 
B 2 72361 36181 12.18 5.49 
C 2 442111 221056 74.43 5.49 
AB 4 9696 2424 0.82 4.11 
AC 4 6213 1553 0.52 4.11 
BC 4 34928 8732 2.94 4.11 
ABC 8 33487 4186 1.41 3.26 
Error 27 80192 2970 
Total 53 746542 


A P-value less than .01 requires an F statistic greater than the F 9, value at the appropriate df (see the far 
right column). All three main effects are statistically significant at the 1% level, but no interaction terms are 
statistically significant at that level. 


59. Based on the P-values in the ANOVA table, statistically significant factors at the level .01 are adhesive 
type and cure time. The conductor material does not have a statistically significant effect on bond strength. 
There are no significant interactions. 


61. SSA = d(x, -X ¥ =— 3X: = , with similar expressions for SSB, SSC, and SSD, each having 
N-1 df. 


7? 


SST => P(X -X. y oP oe ee = with N*— 1 df, leaving N* —1—4(N —1) df for error. 


l 2 3 4 5 Lx” 
Te: 5. 482 446 464 468 434 1,053,916 
xj. 470 451 440 482 451 1,053,626 
et ae 429 484 528 481 1,066,826 
xj: 340 417 466 537 $34 1,080,170 


Also, TEx) = 220,378, x =2294, and CF = 210,497.44. 


Source df SS MS F 
A 4 285.76 71.44 594 
B 4 227.76 56.94 473 
G 4 2867.76 716.94 5.958 
D 4 5536.56 1384.14 11.502 

Error 8 962.72 120.34 

Total 24 


At df= (4, 8), a P-value less than .05 requires an F-statistic greater than F'9s.4.3 = 3.84. Ho, and Hog cannot 
be rejected, while Hyc and Hop are rejected. 
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Section 12.1 


a. Stem and Leaf display of temp: 


17|0 
17|23 
17/445 
17|67 
17 


CHAPTER 12 


stem = tens 
leaf = ones 


18}000001 1 


18}2222 
18/445 
18|6 
18|8 


180 appears to be a typical value for this data. The distribution is reasonably symmetric in appearance 


and somewhat bell 


-shaped. The variation in the data is fairly small since the range of values ( 188 - 


170 = 18) is fairly small compared to the typical value of 180. 


0}889 
1}0000 
1)3 
1}4444 
11/66 
1)8889 
2) 11 


wBeNN NN N 
nn 


S 
o 


For the ratio data, a typical value is around 1.6 and the distribution ap 
The variation in the data is large since the range of the data (3.08 - .84 


stem = ones 
leaf = tenths 


pears to be positively skewed. 
= 2.24) is very large compared 


to the typical value of 1.6. The two largest values could be outliers. 


b. The efficiency ratio is not uniquely 
data of equal temperatures associated with different ef 


determined by temperature since there are several instances in the 
ficiency ratios. For example, the five 


observations with temperatures of 180 each have different efficiency ratios. 
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Chapter 12: Simple Linear Regression and Correlation 


¢. A scatter plot of the data appears below. The points exhibit quite a bit of variation and do not appear 


to fall close to any straight line or simple curve. 


3 . 
| s 
| 
Oo 2 ® 
& ; | 
s | 
s | 
| 
a a"-s 
a s 
T T Tal 
170 180 190 
Temp 
3. A scatter plot of the data appears below. The points fall very close to a straight line with an intercept of 


approximately 0 and a slope of about |. This suggests that the two methods are producing substantially the 


same concentration measurements. 


220 = 
. 
. 
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Chapter 12: Simple Linear Regression and Correlation 


a. The scatter plot with axes intersecting at (0,0) is shown below. 


Temperature (x) vs Elongation (y) 


250 Fa 
200 * 
. 
150 
*. 
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b. The scatter plot with axes intersecting at (55, 100) is shown below. 


Temperature (x) vs Elongation (y) 


c. A parabola appears to provide a good fit to both graphs. 


B. fly a599 = 1800 +1.3(2500) = 5050 
b. expected change = slope = / 1.3 
c. expected change = 100f; = 130 


d. expected change = —100f; =- 130 
a. , = expected change in flow rate (y) associated with a one inch increase in pressure drop (x) = 095. 


b. We expect flow rate to decrease by 5B; = 475. 
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Chapter 12: Simple Linear Regression and Correlation 


My =—-12+.095(10) =.83, and yr,,, =-.12 +.095(15) =1.305. 


, P ( .835—.830 | 
d. P(Y >.835)=P| Z >———— |= P(Z > .20) =.4207. 


025 


( 840 —.830 ' 
P(Y >.840)=P| Z> — | - P(Z > 40) =.3446 
\ ) 


ULe 


6 


Let Y; and Y, denote pressure drops for flow rates of 10 and 11, respectively. Then y,,, =.925, 8 
Y, — ¥; has expected value .830 —.925 = -.095, and sd (.025) +(.025)° =.035355. Thus 


0—(-.095) ) 
035355) 


PUY, > ¥,)=P(Y, -¥, >0)=P| z> P(Z > 2.69) =.0036. 


Il. 
a. 3, = expected change for a one degree increase = —.01, and 108, =-.1 is the expected change for a 10 


degree increase. 


b. fy.) =5.00—-.01( 200) =3, and p,.5) =2.5. 


c. The probability that the first observation is between 2.4 and 2.6 is 


; (2.4-2.5 2.6-2.5 ) : ie ay 
P(2.4<Y <2.6)= P| ———-<Zs = P(—1.33 <Z< 1.33) = .8164. The probability that 
| O75 Vis" 2 
any particular one of the other four observations is between 2.4 and 2.6 is also .8164, so the probability 


that all five are between 2.4 and 2.6 is (.8164)° = .3627. 


d. Let ¥; and Y, denote the times at the higher and lower temperatures, respectively. Then Y;— Y2 has 
expected value 5.00 —.01(x+1)—(5.00-.01v)=-—.01. The standard deviation of ¥, — ¥2 is 


\ 


——-—— | 01 
(075) +(.075)° =.10607. Thus PCY, -} >0)=P| Z> oy Pe > 09) =.4641. 
10607 


Section 12.2 


13. For this data,n = 4, Xx, = 200 , Zy, =5.37, Lx) =12.000, Ly? =9.3501, Xx,y, =333 => 
200) 5.37) 200)(5.37 : 
a = 2000, SST = S, =9.3501 oS ) 9.140875, 8. 3339 eas 


S_, =12,000- 
4 


gS Eas : ; 
=> f= = =.03225 => SSE=S, - BS. =2.14085—(.03225)(64.5) =.060750. From these 
S. 2000 ws 


: , SSE .060750 mgd ; ay ! 

calculations, r° =1-——— = 1 - —=.972. This is a very high value of »~, which confirms the 
SST 2.14085 

authors’ claim that there is a strong linear relationship between the two variables. (A scatter plot also shows 


a strong, linear relationship.) 
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15. 

a. The following stem and leaf display shows that: a typical value for this data is a number in the low 
40’s, There is some positive skew in the data. There are some potential outliers (79.5 and 80.0), and 
there is a reasonably large amount of variation in the data (¢.g., the spread 80.0-29.8 = 50.2 is large 
compared with the typical values in the low 40’s). 

2|9 

3)33 stem = tens 
3)5566677889 leaf = ones 
4|1223 

4156689 

5}1 

5 

6|2 

6|9 

+ 

7|9 

8|0 

b. No, the strength values are not uniquely determined by the MoE values. For example, note that the 
two pairs of observations having strength values of 42.8 have different MoE values. 

c. The least squares line is ¥ = 3.2925 + .10748x. For a beam whose modulus of elasticity is x = 40, the 
predicted strength would be y = 3.2925 + .10748(40) = 7.59. The value x = 100 is far beyond the range 
of the x values in the data, so it would be dangerous (i.e., potentially misleading) to extrapolate the 
linear relationship that far. 

d. From the output, SSE = 18.736, SST = 71.605, and the coefficient of determination is = .738 (or 
73.8%). The 7* value is large, which suggests that the linear relationship is a useful approximation to 
the true relationship between these two variables. 

17. 


a. From software, the equation of the least squares line is ¥ = 118.91 —.905x. The accompanying fitted 
line plot shows a very strong, linear association between unit weight and porosity. So, yes, we 
anticipate the linear model will explain a great deal of the variation in y. 


Fitted Line Plot 
porosity = 118.9- 0.9047 weight 
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Chapter 12: Simple Linear Regression and Correlation 


The slope of the line is 5, = -.905. A one-pef increase in the unit weight of a concrete specimen is 
associated with a .905 percentage point decrease in the specimen’s predicted porosity. (Note: slope is 
not ordinarily a percent decrease, but the units on porosity, y, are percentage points.) 


When x = 135, the predicted porosity is ¥ = 118.91 — .905(135) = —3,265. That is, we get a negative 
prediction for y, but in actuality y cannot be negative! This is an example of the perils of extrapolation; 
notice that x = 135 is outside the scope of the data. 


The first observation is (99.0, 28.8). So, the actual value of y is 28.8, while the predicted value of y is 
118.91 —.905(99.0) = 29.315. The residual for the first observation is y — 9 = 28.8 — 29.315 =—515= 
—.52. Similarly, for the second observation we have y = 27.41 and residual = 27.9 — 27.41 = .49. 


From software and the data provided, a point estimate of o is s = .938. This represents the “typical” 
size of a deviation from the least squares line. More precisely, predictions from the least squares line 
are “typically” + .938% off from the actual porosity percentage. 


From software, ° = 97.4% or .974, the proportion of observed variation in porosity that can be 
attributed to the approximate linear relationship between unit weight and porosity. 


n= 14, Ex, =3300, Ly, = 5010, Dx? =913,750, Ly? = 2,207,100, Lx,y, = 1,413,500 


 _ 3,256,000 
"1,902,500 
roughly py =—45.5519 + 1.7114x. 


=1.71143233, B, =-45.55190543 , so the equation of the least squares line is 


jiy ys = 45.5519 +1.7114(225) = 339.51. 


Estimated expected change = 508, =—85.57 . 


No, the value 500 is outside the range of x values for which observations were available (the danger of 
extrapolation). 


Yes —a scatter plot of the data shows a strong, linear pattern, and 7 =98.5%. 


From the output, the estimated regression line is ¥ = 321.878 + 156.711x, where x = absorbance and y 
= resistance angle. For x = .300, 9 = 321.878 + 156.71 1(.300) = 368.89. 


The estimated regression line serves as an estimate both for a single y at a given x-value and for the 
true average ji, at a given x-value. Hence, our estimate for 4, when x = .300 is also 368.89. 


Using the given y;’s and the formula y, = -45.5519 +1.7114x; , 


SSE =(150-125.6)" +... + (670 - 639.0) = 16,213.64. The computation formula gives 
SSE = 2,207,100 — (— 45.551905435010)—(1.71143233\1,413,500) = 16,205.45 


16,205.45 _ 


SST = 2,207,100 — a 
414,235.71 


010) . 
SOUoY” - 414,235.71 so r? =1- 
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“ t- RY x “ 
25. Substitution of 8, = ce i ta and f, for by and b, on the left-hand side of the first normal equation 
n 


yields irs, + BEx, = Zy,- B&x, + pxx = Ly, , which verifies they satisfy the first one. The same 
n 


substitution in the left-hand side of the second equation gives 
rx, xy, = BXx, > a xy, Ly, + B n&x; ri (=x, y > 2 2 
a fP- AP) ay _(Exy(to)e Ales! EY) 5 5,)/m (Ba? —(2) 0). The 
n n 
last term in parentheses is S,., so making that substitution along with Equation (12.2) we have 


S 2 Ge 
(Xx, )(Zy,)/n +(S. )=(Zx, )(Zy,)/2+5,, . By the definition of S,,, this last expression is exactly 


= x,y, , which verifies that the slope and intercept formulas satisfy the second normal equation. 


27. We wish to find b, to minimize /(b,) = DY» -b,x, : . Equating f'(b,) to 0 yields 

: 2 2x; 9; 
Y[ 20. —b,x,) (-%, )| =0> 2>)[-x, +x? | =0 => Ex, y; =b, Ex; and 6, = ; —! . The least squares 
2 


i 


56a P Y; 
estimator of , is thus #; = = ~ 


i 


29, For data set #1, 7° = .43 and s = 4.03; for #2, Yr =.99 and s = 4.03; for #3, P= .99 and s = 1.90. In general, 
we hope for both large ? (large % of variation explained) and small s (indicating that observations don't 
deviate much from the estimated line). Simple linear regression would thus seem to be most effective for 


data set #3 and least effective for data set #1. 
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Chapter 12: Simple Linear Regression and Correlation 


Section 12.3 


31. 

a. Software output from least squares regression on this data appears below. From the output, we see that 
= 89.26% or 8926, meaning 89.26% of the observed variation in threshold stress (y) can be 
attributed to the (approximate) linear model relationship with yield strength (x). 

Regression Equatior 

y 211.655 - 0.175635 

oefficients 
SE Coef T P 95% CI 
6 15.0622 14.0521 0.000 (178.503, 244.807) 

x -0.176 0.0184 -9.5618 0.000 ( -0.216, -0.135) 

Summary of Model 

S = 6.80578 R-Sq = 89.26% R-Sq(adj) = 88.28% 

b. From the software output, /,=—0.176 and 5, = (0.0184. Alternatively, the residual standard deviation 
is s = 6.80578, and the sum of squared deviations of the x-values can be calculated to equal S,, = 

2 : * s ‘ ° 
Yo —¥)° = 138095. From these, s; === = .0183 (due to some slight rounding error). 
ae 

c. From the software output, a 95% CI for , is (0.216, -0.135). This is a fairly narrow interval, so f; 
has indeed been precisely estimated. Alternatively, with n = 13 we may construct a 95% CI for f; as 
B, Eb 995125 = 0.176 + 2.179(.0184) = (0.216, -0.136). 

33. 


a. Error df=n—2=25, f2s.25 = 2.060, and so the desired confidence interval is 
B, tt o525°8, =-10748+ (2.060)(.01280) =(.081,.134). We are 95% confident that the true average 


change in strength associated with a | GPa increase in modulus of elasticity is between .081 MPa and 
.134 MPa. 


b. We wish to test Hy: 8; < .1 versus H,: 8, > .1. The calculated test statistic is 


S$. 01280 


A, 


3—.1 .10748—. ; : : - : 
B LOTS —.1 =.58 , which yields a P-value of .277 at 25 df. Thus, we fail to reject Hp; i.e., 


there is not enough evidence to contradict the prior belief. 
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2 
We want a 95% Cl for #;: Using the given summary statistics, 5, = 3056.69 — aay = 155.019, 
2 Te 
S.* 27596 — 222-1193) = 238.112, and f, =—~= 238.112 _ 1 536, We need 
: 17 S,, 115.019 


~ 93 -(1. ; 
Bo = 193-1 536)z22) = ~8.715 to calculate the SSE: 


SSE = 2975 -(-8.715)(193) —(1.536)(2759.6) = 418.2494. Then s = jfsaeet = 5,28 and 


5.28 : 
Se = = 424. With top5,15 = 2.131, our Cl is 1.5364+2.131- 424) = (.632, 2.440). With 

A f55.019 ge! (474) 

95% confidence, we estimate that the change in reported nausea percentage for every one-unit change 
in motion sickness dose is between .632 and 2.440. 


We test the hypotheses Ho: ; = 0 versus H,: 8, # 0, and the test statistic is t= a = 3.6226. With 
df = 15, the two-tailed P-value = 2P(T > 3.6226) = 2(.001) = .002. Witha P-value of .002, we would 
reject the null hypothesis at most reasonable significance levels. This suggests that there is a useful 
linear relationship between motion sickness dose and reported nausea. 


No. A regression model is only useful for estimating values of nausea % when using dosages between 
6.0 and 17.6, the range of values sampled. 


Removing the point (6.0, 2.50), the new summary stats are: n = 16, Dx, =216.1, Ly, = 191.5, 
Ex? = 3020.69 , Ly? = 2968.75, =x,y, = 2744.6, and then B, =1:561, B, =~9.118 , SSE = 430.5264, 


s =5,55, Si =,551, and the new Cl is 1.561+2.145-(.551) , or (.379, 2.743). The interval is a little 


wider. But removing the one observation did not change it that much. The observation does not seem 
to be exerting undue influence, 


Let jy = the true mean difference in velocity between the two planes. We have 23 pairs of data that we 
will use to test Ho: ug = 0 v. Hy: da #0. From software, X, = 0.2913 with sy= 0.1748, and so t= 


0.2913 - : 
SS ~ 8, which has a two-sided P-value of 0.000 at 22 df. Hence, we strongly reject the null 
hypothesis and conclude there is a statistically significant difference in true average velocity in the two 
planes. [Note: A normal probability plot of the differences shows one mild outlier, so we have slight 


concern about the results of the t procedure.] 


Let f, denote the true slope for the linear relationship between Level — — velocity and Level — velocity. 
b, -1 _ 0.65393-1 


We wish to test Ho: B; = 1 v. Hg: 8; < 1. Using the rel t b rovided, t= ———= 
e wish to test Ho: B; Vv B, sing the relevant numbers prov s,) 0.05947 


= —5,8, which has a one-sided P-value at 23-2 = 21 df of P(T< 5,8) = 0. Hence, we strongly reject the 
null hypothesis and conclude the same as the authors: i.e., the true slope of this regression relationship 
is significantly less than 1. 
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39. SSE = 124,039.58— (72.958547)( 1574.8) — (.04103377)(222657.88) = 7.9679, and SST = 39.828 


Source df SS MS Lf 
Regr I 31.860 31.860 18.0 
Error 18 7.968 1.77 
Total 19 39.828 


At df =(1, 18), f= 18.0 > Fo91,1,13 = 15.38 implies that the P-value is less than .001. So, Ho: 6, = 0 is 

rejected and the model is judged useful. Also, s = V1.77 =1.33041347 and S.. = 18,921.8295, so 

t= ANE isad LE a ae 4.2426 and ¢° = (4.2426) =18.0= f , showing the equivalence of the 
1.33041347/J/18,921.8295 


two tests. 


41. 
a. Under the regression model, E(¥,) =f, + f,x, and, hence, E(Y) =f, + 8, . Therefore, 
fe . 4, -¥)Y,-¥ x,-¥)ELY, -¥ 
E(Y, -Y)= B(x, —*), and E(R)=E 2G - GY) ph DEM -O 
20 —x) Y, —x) 
_ YG, -HBG, -¥) _ B > (x, -¥) | 
¥(y, -¥) sp HERS 3 aii 
b. Here, we'll use the fact that (x, -¥)(%,-¥) = )°(x, -¥)¥ - FL, -¥) = LV, - HY -YO = 
(x, -*)Y,. With c=2(x,-¥)', 2 = 1S (x, ~x\(¥,-Y) = See Y, => since the Yis are 
e ; 
independent, Vip, )= >>| ce V(Y)= “F(x, - xy ot eae Se equivalently, 
ial ay, c c Y (x, —x) 
: si , as desired. 
Dx? —(Ex,) /n 
43. The numerator of d is |1 — 2| = 1, and the denominator is _AVI4 =.831,so d= AS% 1.20. The 
V 324.40 831 


approximate power curve is for n — 2 df= 13, and f is read from Table A.17 as approximately .1. 
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Section 12.4 


45. 


47, 


49, 


51. 


We wish to find 290% CI for fji.5: S425 = 78-088 , tos:3 = 1-734, and 


25-140.895)° Peer 
ee + (125-140.895)" = 1674. Putting it together, we get 78.088 + 1.734(.1674) = 
20 18,921.8295 


(77.797, 78.378). 


We want a 90% PI. Only the standard error changes: s. =s bates (125-140.895)_ = 6860, So the PI is 
, 20 18,921.8295 


78.088 + 1.734(.6860) = (76.898, 79.277). 


Because the x’ of 115 is farther away from X than the previous value, the term (x*-—X)’ will be 
larger, making the standard error larger, and thus the width of the interval is wider. 


We would be testing to see if the filtration rate were 125 kg-DS/m/h, would the average moisture 


Ms .088 —80 
content of the compressed pellets be less than 80%. The test statistic 1s 1 = Sree =—11.42, and 


with 18 df the P-value is P(T <—11.42) = 0.00. Hence, we reject Hy. There is significant evidence to 
prove that the true average moisture content when filtration rate is 125 is less than 80%. 


Piqqy = — 1-128 +.82697 (40) = 31.95, taps: = 2-160 5 a 95% PI for runoff is 


> 


31.95 +2.160,/(5.24) + (1.44) =31.95+11.74 = (20.21,43.69). 


No, the resulting interval is very wide, therefore the available information is not very precise. 
Dx = 798, Ux? = 63,040 which gives S,, = 20,586.4 , which in turn gives 


2 
Sa) = I.24 fa + (0-53.20) = 1,358, so the PI for runoff when x = 50 is 
"er 15 20,586.4 


AY 


40.22 + 2.1604/(5.24)? +(1.358)° =40.22+11.69 = (28.53,51.92). The simultaneous prediction level 
for the two intervals is at least 100(1— 2a}% =90%. 


95% Cl = (462.1, 597.7) => midpoint = 529.9; t 25,9 =2.306 => 529.9+(2.306) 8% ais) = 597-7 > 


5 5 alts) = 29-402 > 99% Cl = 529.9 + to0s5 (29.402) = 529.9 + (3.355)(29.402) = (431.3,628.5) . 

a. 0.40 is closer to ¥. 

b. 8, +B,(0.40)4heran28 . joa) OF 0-810422.101(0.0311) = (0.745, 0.876). 

ce. B+ B,(1.20)ttyana yf +8 a saga OF 0-2912£2.101- (0.1049) +(0.0352) =(.059,.523). 
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Choice a will be the smallest, with d being largest. The width of interval a is less than b and ¢ (obviously), 
and b and ¢ are both smaller than d. Nothing can be said about the relationship between b and c. 


B, + Bxt = (7 -BX)+ Bxt =F +(x*-X)B =-Ly, AGM Say , Where 
n 


a 


3? [+2 Man), erat = 
n 


nS. nS? 
hg wig 2 FDL GHB , FD 9) 
n nS... a 


a *_7y¢ *_>7y 
want 1, G@*-x) 0 «& x) Sa | g?| 1, @*-%) 
n ns. S: n Ss, 


x sa 


Section 12.5 


57. 


59. 


Most people acquire a license as soon as they become eligible. If, for example, the minimum age for 
obtaining a license is 16, then the time since acquiring a license, y, is usually related to age by the equation 
y =x— 16, which is the equation of a straight line. In other words, the majority of people in a sample will 
have y values that closely follow the line y = x — 16. 


1950)" 47.92) 
a: Se =251,979-(250) = 40,720, S, =130.6074 males = 3.033711, and 


(1950)(47.92) 339.586667 


= 339.586667 , sO. == ——————— 
40, 720V3.033711 


S,, =5530.92 - =.9662. There is a very 


strong, positive correlation between the two variables. 


b. Because the association between the variables is positive, the specimen with the larger shear force will 
tend to have a larger percent dry fiber weight. 


c. Changing the units of measurement on either (or both) variables will have no effect on the calculated 
value of r, because any change in units will affect both the numerator and denominator of r by exactly 
the same multiplicative constant. 


d. 7° = 96627 =.933, or 93.3%. 


rvn—2 _ .9662V18-2 


e. We wish to test Ho: p = 0 v. H,: p > 0. The test statistic is t = 
vi-r V1-.9662? 


“off the charts” at 16 df, so the one-tailed P-value is less than .001. So, Hy should be rejected: the data 
indicate a positive linear relationship between the two variables. 


= 14.94. This is 
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a. Weare testing Hy: p=O0v. Hy: p> 9. The correlation is r= IS 


the test statistic is f= we ~3.9. At 14 df, the P-value is roughly .001. Hence, we reject Ho: 
1—.7482° 


there is evidence that a positive correlation exists between maximum lactate level and muscular 
endurance. 


= .7482 , and 


b. Weare looking for r. the coefficient of determination: P = (.7482)° = .5598, or about 56%. It is the 
same no matter which variable is the predictor. 


With the aid of software, the sample correlation coefficient is r= .7729. To test Ho: p = 0 v. H,: p # 9, the 
ie 1129) V6— ‘ F ; 
test statistic is ‘= cis abused =72.44. At 4 df, the 2-sided P-value is about 2(.035) = .07 (software gives 
I -(.7729) 
a P-value of .072). Hence, we fail to reject Ho: the data do not indicate that the population correlation 


coefficient differs from 0. This result may seem surprising due to the relatively large size of r (.77), 
however, it can be attributed to a small sample size (7 = 6). 


a. From the summary statistics provided, a point estimate for the population correlation coefficient p isr 
> ( -DOA I Oe 44,185.87 


y VX YD - (64,732.83)(130,566.96) 


b. The hypotheses are Ho: p = 0 versus H,: p +0. Assuming bivariate normality, the test statistic value is 


= 4806. 


i= ao = = AB06VI5—2 = 1.98. At df= 15 — 2 = 13, the two-tailed P-value for this f test is 2P(7)\3 
vi-r  vi-.4806 


> 1.98) = 2P(T}3 2 2.0) = 2(.033) = 066. Hence, we fail to reject Hp at the .01 level; there is not 


sufficient evidence to conclude that the population correlation coefficient between internal and external 


rotation velocity is not zero, 


c. If we tested Ho: p = 0 versus H,: p > 0, the one-sided P-value would be .033. We would still fail to 
reject Hy at the .01 level, lacking sufficient evidence to conclude a positive true correlation coefficient. 
However, for a one-sided test at the .05 level, we would reject Hy since P-value = .033 <.05. We have 
evidence at the .05 level that the true population correlation coefficient between internal and external 
rotation velocity is positive. 


a. Because P-value = .00032 < a= .001, Ho should be rejected at this significance level. 


b. Not necessarily. For such a large 7, the test statistic ¢ has approximately a standard normal distribution 


when Hp: p = 0 is true, and a P-value of .00032 corresponds to z = +3.60. Solving + 3.60 = — 
l-r 

for r yields r=+ .159. That is, with n = 500 we'd obtain this P-value with r = + .159. Such an r value 

suggests only a weak linear relationship between x and y, one that would typically have little practical 


importance. 
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c. The test statistic value would ber = a ra 
1-.022° 


approximately normal, the 2-sided P-value would be roughly 2[1 — ©(2.20)] = .0278 < .05, so Hp is 
rejected in favor of H, at the .05 significance level. The value ¢ = 2.20 is statistically significant — it 
cannot be attributed just to sampling variability in the case p = 0. But with this enormous n, r= .022 
implies p ~ .022, indicating an extremely weak relationship. 


= 2.20 ; since the test statistic is again 


Supplementary Exercises 


69. 


71. 


Use available software for all calculations. 

a. We wanta confidence interval for £,. From software, b; = 0.987 and s(b,) = 0.047, so the 
corresponding 95% CI is 0.987 + t25,:7(0.047) = 0.987 + 2.110(0.047) = (0.888, 1.086). We are 95% 
confident that the true average change in sale price associated with a one-foot increase in truss height 
is between $0.89 per square foot and $1.09 per square foot. 


b. Using software, a 95% CI for 4,2 is (47.730, 49.172). We are 95% confident that the true average sale 
price for all warehouses with 25-foot truss height is between $47.73/ft and $49.17/ft. 


c. Again using software, a 95% PI for Y when x = 25 is (45.378, 51.524). We are 95% confident that the 
sale price for a single warehouse with 25-foot truss height will be between $45.38/ft’ and $51.52/f0. 


d. Since x = 235 is nearer the mean than x = 30, a PI at x = 30 would be wider. 


e. From software, 7? = SSR/SST = 890.36/924.44 = .963. Hence, r= ¥.963 = .981. 


Use software whenever possible. 
a. From software, the estimated coefficients are B, = 16.0593 and f, = 0.1925. 


b. Test Hp: 8; =0 versus H,: £; # 0. From software, the test statistic is f= eee = 54.15; even at 
just 7 df, this is “off the charts” and the P-value is ~ 0. Hence, we strongly reject Hy and conclude that 


a Statistically significant relationship exists between the variables. 


¢. From software or by direct computation, residual sd = s = .2626,¥ = .408 and S,, = .784. When x = x* 

= .2, = 0.1925 + 16.0593(.2) = 3.404 with an estimated standard deviation of | 
: 
: 


1 : (.2—.408)° 
9 784 


aw, > 

x*-x) 

vs ) 
a 


result in ¥ = 6.616 and s, = .088, confirming what’s claimed. Prediction error is larger when x = .2 


= .2626 = .107. The analogous calculations when x = x* = .4 


because .2 is farther from the sample mean of .408 than is x = .4. 
d. A95% CI for py. is Ptto<4 8; = 6.616 + 2.365(.088) = (6.41, 6.82). 


e. A95% PI for Ywhenx = .4 is Pttgso > J/s8° +5; = (5.96, 7.27). 
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From the output, 7° = .5073. 


r =(sign of slope): Vr? = +V.5073 = .7122. 


We test Hp: f; = 0 versus H,: B, #0. The test statistic ¢= 3.93 gives P-value = .0013, which is < 01, 
the given level of significance, therefore we reject Ho and conclude that the model is useful. 


We use a 95% Cl for py¢9- 3(50) = .787218 + .007570(50) = 1.165718; ¢.025,15 = 2.131; 
2 i 


17(50-42.33 
! (50 ) = 951422. The resulting 95% 


s = “Root MSE” = .20308 => s, =.20308, |—+————___-_—-—__ 
17 17(41,575)—(719.60) 


Cl is 1.165718 + 2.131(.051422) = 1.165718 + 109581 = (1.056137, 1.275299). 


Our prediction is (30) = .787218 + 007570(30) = 1.0143, with a corresponding residual of y—y = 
80 — 1.0143 =-.2143. 


Ds ee .130- ‘ .16)/ 597 
With y = stride rate and x = speed, we have /, = ie ~ oe = 2 a 


0.080466 and B, =y- BX = (35.16/11) — 0.080466(205.4)/11 = 1.694. So, the least squares line for 
predicting stride rate from speed is y = 1.694 + 0.080466x. 


S,, _ 660.130—(35.16)(205.4)/11 3.597 


With y = speed and x = stride rate, we have B, = =; = 7112.681=G5.160°/11 = 0207 


12.117 and A, =7- A,X =(205.4)/11 - 12.117(35.16/11) =-20.058. So, the least squares line for 
predicting speed from stride rate is = ~20.058 + 12.117x. 


The fastest way to find 7* from the available information is r° = B ri . For the first regression, this 


yy 


44.702 ; 2 » 0.297 
—*"—* = .97. For the second ssion, 7” =(12.117)° 
0.297 or the second regression, 7 ( ) 44.702 


fact, rounding error notwithstanding, these two values should be exactly the same. 


gives r? =(0.080466)’ = .97 as well. In 


Yes: the accompanying scatterplot suggests an extremely strong, positive, linear relationship between 
the amount of oil added to the wheat straw and the amount recovered. 


' 
: 
{ 
bs ——— +. — ————— ' 
0 2 ‘ ” . w R “ % “ : 


Amount of od added (g) 
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b. Pieces of Minitab output appear below. From the output, ° = 99.6% or .996. That is, 99.6% of the total 
variation in the amount of oil recovered in the wheat straw can be explained by a linear regression on 
the amount of oil added to it. 


Predictor Coef SE Coef T P 
Constant -0.5234 0.1453 -3.60 0.003 
x 0.87825 0.01610 54.56 0.000 


S = 0.311816 R-Sq = 99.6%  R-Sq(adj) = 99.5% 
Predicted Values for New Observations 


SE Fit 95% CI 95% PI 
(3.6732, 4.0625 (3.1666, 4.5690) 


New Obs Fit 
1 3.8678 0.0901 


c. Refer to the preceding Minitab output. A test of Ho: 8; = 0 versus H,: 6, # 0 returns a test statistic of 
t= 54,56 and a P-value of = 0, from which we can strongly reject Hy and conclude that a statistically 
significant linear relationship exists between the variables. (No surprise, based on the scatterplot!) 


d. The last line of the preceding Minitab output comes from requesting predictions at x = 5.0 g. The 
resulting 95% PI is (3.1666, 4.5690). So, at a 95% prediction level, the amount of oil recovered from 
wheat straw when the amount added was 5.0 g will fall between 3.1666 g and 4.5690 g. 


e. A formal test of Hy: p = 0 versus H,: p #0 is completely equivalent to the ¢ test for slope conducted in 
c. That is, the test statistic and P-value would once again be f = 54.56 and P ~ 0, leading to the 
conclusion that p # 0. 


Aa . ~o By fz 
Start with the alternative formula SSE = Ly° — f,2y— B,Xxy. Substituting B, = fo AN 
_, Ty-Axx . , (Sy) Brxdy , (yy |; EXE 
SSE = zy fi a ocd — Brxy = zy" 2y) BEY _ Ay, = Ly" yy) = B, [Be- >x | 
n n n n 
= S, -BS, 


x,-xyY _ § tye > 
a. Recall that r= gO! ee and similarly s» =~. Using these formulas, 


ee n-l nhs * n=l 


reVSa5y° ys (n—l)s? 
p= ey joe 
a a 2 (n—l)s, 


squares equation becomes p= f, + f.x=¥+ B(x—-X)= 


~~ 


8s A A 
-—. Using the fact that 6, = ¥— 2X, the least 
s 


s ‘ 

Vtr-—(x-X) , as desired. 

b. In Exercise 64, r= .700. So, a specimen whose UV transparency index is | standard deviation below 
average is predicted to have a maximum prevalence of infection that is .7 standard deviations below 
average. 


Remember that SST = S,,, and use Exercise 79 to write SSE = S. — BS, =5.— S/S... Then 


5 


S,__5,/Sx_S»-SSE_, SSE_, SSE 


Be ade og fp me 


58 S s S, SST 


> an 
xr yy ny yy 


a 
ll 
i 
ll 
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85. Using Minitab, we create a scatterplot to see if a linear regression model is appropriate. 
7 | ec -~ —SS “4 R “] 
ro] 
3° é , > 
bd e° " F 
Hy *ee*e 
4 @ . . 
her enh iis 
= a = 
tf] 10 20 x” 40 50 60 
time 


A linear model is reasonable; although it appears that the variance in y gets larger as x increases. The 
Minitab output follows: 


The regression equation is 


blood glucose level = 3.70 + 0.0379 time 

Predictor Coef StDev T P 

Constant 3.6965 0.2159 17.12 0.000 

time 0.037895 0.006137 6.17 0.000 

S = 0.5525 R-Sq = 63.4% R-Sq(adj) = 61.7% 

Analysis of Variance 

Source DF ss MS F P 
Regression 1 11.638 11.638 38.12 0.000 
Residual Error 22 6.716 0.305 

Total 23 18.353 


The coefficient of determination of 63.4% indicates that only a moderate percentage of the variation iny 
can be explained by the change in x. A test of model utility indicates that time is a significant predictor of 
blood glucose level. (¢= 6.17, P = 0). A point estimate for blood glucose level when time = 30 minutes is 
4.833%. We would expect the average blood glucose level at 30 minutes to be between 4.599 and 5.067, 
with 95% confidence. 


87. From the SAS output in Exercise 73, m = 17, SSE, = 0.61860, B = (),007570; by direct computation, 


ie OLS 51350 
SS,; = 11,114.6. The pooled estimated variance is @ = =_ = = = 040432, and the calculated test 
_ 


statistic for testing Ho: 2; = 71 18 
570 — ' 
t= ___ Se ~ ().24. At 28 df, the two-tailed P-value is roughly 2(.39) = 78. 


V.040432 ae 


11114.6 i 7152.5578 
With such a large P-value, we do not reject Mp at any reasonable level (in particular, .78 > .05). The data do 
not provide evidence that the expected change in wear loss associated with a 1% increase in austentite 
content is different for the two types of abrasive — it is plausible that ABy=n- 
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CHAPTER 13 


ya 


x =15 and > ( i x) = 250 , so the standard deviation of the residual Y, Y, i 


= 6.32, 8.37, 8.94, 8.37, and 6.32 for i= 1, 2, 3, 4, 5. 


is likely to be much smaller for the observation made in the 


The deviation from the estimated line 
when x = 25. That is, the observation (50, ¥) is 


experiment of b for x = 50 than for the experiment of a 
more likely to fall close to the least squares line than is (25, Y). 


This plot indicates there are no 
distributed. A straight-line regression function is a reasonable choice for a model. 


° pi ay 
: 
pb ae Ta SE A ee PLT. Tae 
filtrat 
r : ee z (2817.9) ; x 
We need S,, = ¥( x, -—¥) =415,914.85 ea —18.886.8295. Then each e, can be calculated as 
é _. The table below shows the values. 


follows: ,* — 


x, - 140.895) 
47 Be ee 
¥ 20 18,886.8295 
Notice that if e° ~ e;/s, then e,/e, ~s. Allof the ¢,/ e,’s range between .57 and .65, which are close 


to Ss. 
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outliers, the variance of ¢ is reasonably constant, and the ¢ are normally 
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standardized ‘ standardized : 
__ residuals €,1e residuals GIG 
0.31064 0.644053 0.6175 0.64218 
0.30593 0.614697 0.09062 0.64802 
0.4791 0.578669 1.16776 0.565003 
1.2307 0.647714 1.50205 0.646461 
1.15021 0.648002 0.96313 0.648257 
0.34881 0.643706 0.019 0.643881 
0.09872 0.633428 0.65644 0.584858 
1.39034 0.640683 2.1562 0.647182 
0.82185 0.640975 ~0.79038 0.642113 
0.15998 0.621857 1.73943 0.631795 


c. This plot looks very much the same as the one in part a. 


2 
° 
2 
co) ° ® 
e mi 
2 Ps °, 
a ae 
a] ) o 
@ . bd ° 
N | . 
ss) 
o ° 
a ° 
5 i ° 
2 
° 
weipaigen sina ~ ' 
100 150 200 


filtration rate 


a. 97.7% of the variation in ice thickness can be explained by the linear relationship between it and 
elapsed time. Based on this value, it is tempting to assume an approximately linear relationship; 
however, 7° does not measure the aptness of the linear model. 


b. The residual plot shows a curve in the data, suggesting a non-linear relationship exists. One 
observation (5.5, —3.14) is extreme. 


24 . 
e 
1 ee e 
“as e eee 
ee 
-_ DB Oie . 
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From software and the data provided, the least squares line is y = 84.4 — 290x. Also from software, the 
coefficient of determination is * = 77.6% or .776. 


Regression Analysis: y versus x 


ess 


ion equation is 


regressio 
y = 84.4 - 290 x 

Predictor Coef SE Coef T P 
Constant 84.38 11.64 7.25 0.000 
x -289.79 12 -6.72 0.000 


S = 2.72669 R-Sq = 77.6% R-Sq(adj) = 75.9% 


The accompanying scatterplot exhibits substantial curvature, which suggests that a straight-line model 
is not actually a good fit. 


024 025 026 0277 028 0.29 


Fits, residuals, and standardized residuals were computed using software and the accompanying plot 
was created. The residual-versus-fit plot indicates very strong curvature but not a lack of constant 
variance. This implies that a linear model is inadequate, and a quadratic (parabolic) model relationship 
might be suitable for x and y. 


Standardized Residual 


Fitted Value 
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Both a scatter plot and residual plot (based on the simple linear regression model) for the first data set 
suggest that a simple linear regression model is reasonable, with no pattern or influential data points which 
would indicate that the model should be modified. However, scatter plots for the other three data sets 
reveal difficulties. 


Scatter Plot for Data Set #1 Scatter Plot for Data Set #2 
" 
9 
10 
8 
9 
8 7 
7 Ei: 
6 5 
5 as 
a 3 
a 9 “ 4 9 4 
x x 


Scatter Plot for Data Set #3 Scatter Plot for Data Set #4 


For data set #2, a quadratic function would clearly provide a much better fit. For data set #3, the 
relationship is perfectly linear except one outlier, which has obviously greatly influenced the fit even 
though its x value is not unusually large or small. One might investigate this observation to see whether it 
was mistyped and/or it merits deletion. For data set #4 it is clear that the slope of the least squares line has 
been determined entirely by the outlier, so this point is extremely influential. A linear model is completely 
inappropriate for data set #4. 
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a. Y,-Y,=¥,-¥-8(x,-¥)=Y,--—Dy, é =)'c,¥, , where 
A Ea £(x,-*x) 
ashe). forj=iand c,=1 1_& ras =) for j/#i. Thus 
n nE(x,—¥) fe 2(x,-¥) 


V(Y, ~ Y, ) = EV (c,Y, ) (since the Y,’s are independent) = o°Ec’ which, after some algebra, gives 


Equation (13.2). 


b. a =V(K)=V(¥,+(¥,-F)J=V%)+¥(¥-¥), 80 


V(¥,-¥,)=0°-V(X) =o? -o° LI tao A , which is exactly (13.2). 
; n =(x _~x) 


&\- 
J 


= —\2 TH Ve ; LX? aS 
c. Asx, moves further from ¥, (x,-*¥) grows larger, so V(Y,) increases since (x,—X) has a positive 


signin V(Y), but v(Y, - ¥,) decreases since (x, — ¥) has a negative sign in that expression. 


The distribution of any particular standardized residual is also a ¢ distribution with n — 2 d.f., since €, is 


y-Y, ee ie 
/_“ and substituting the estimate of oin the denominator 


obtained by taking standard normal variable 
OY § 
(exactly as in the predicted value case). With E’ denoting the i® standardized residual as a random 


variable, when n= 25 E° has a ft distribution with 23 df and ¢,, ,, =2.50, so P( E; outside (-2.50, 2.50)) = 


P(E; >2.50)+ P(E; < -2.50)=.01+.01=.02. 
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Section 13.2 


15. 
a. The scatterplot of y versus x below, left has a curved pattern. A linear model would not be appropriate. 


b. The scatterplot of In(y) versus In(x) below, right exhibits a strong linear pattern. 


Seatte rplot of y vs x Scatte rplot of In(y) vs In(x) 


c. The linear pattern in b above would indicate that a transformed regression using the natural log of both 
x and y would be appropriate. The probabilistic model is then y = cx? -¢, the power function with an 
error term. 


d. A regression of In(y) on In(x) yields the equation In() = 4.6384 — 1.04920 In(x). Using Minitab we 
can get a PI for y when x = 20 by first transforming the x value: In(20) = 2.996. The computer 
generated 95% PI for In(y) when In(x) = 2.996 is (1.1188, 1.8712). We must now take the antilog to 
return to the original units of y: mae e'*7) = (3.06, 6.50). 


e. A computer generated residual analysis: 


Residual Plots for In(y) 
Normal Probability Plot Versus Fits 
a 03 
90 02 
qa 
; * ~ ol 
2 £ oo 
i Pa et aA 
' 
4 #2 0.0 02 O4 0 ' 2 ) 
Reodual Fitted Value 
Histogram Versus Order 
4 03 
= 3 - 
4 
Jo 
r 6 
“ = 00 
1 
41 
0 
02 01 OO Of 02 43 i 2 3 a $ 6 7 ‘ 
Residual Observation Order 


Looking at the residual vs. fits (bottom right), one standardized residual, corresponding to the third 
observation, is a bit large. There are only two positive standardized residuals, but two others are 
essentially 0. The patterns in the residual plot and the normal probability plot (upper left) are 
marginally acceptable. 
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Lx} = 15.501, Ly; = 13.352, Ex/? = 20.228, Ly? =16.572, Ex} y} =18.109, from which 


B, = 1.254 and By =—.468 so f= B, =1.254 and a =e" ** = 626. 
The plots give strong support to this choice of model; in addition, ? = 960 for the transformed data. 


SSE = .11536 (computer printout), s = .1024, and the estimated sd of B, is .0775, so 


_ 125-4 


t= 
0775 
rejected in favor of H,. 


=~],02. for a P-value at 11 df of P(T <—1.02) = .16. Since .16 > .05, Ho cannot be 


The claim that fy; =2fy.25 is equivalent toa - 5% = 2a(2.5/ , or that = 1. Thus we wish test 
1.25-1 
.0775 
.008. Since .008 < .01, Hp is rejected at level .01. 


Ho: B = \versus H,: B #1. With t= = 3.28, the 2-sided P-value at 11 df is roughly 2(.004) = 


No, there is definite curvature in the plot. 


With x = temperature and y = lifetime, a linear relationship between In(lifetime) and 1/temperature 
implies a model y = exp(a + f/x + €). Let x‘ = 1/temperature and y’ = In(lifetime). Plotting y' vs. x’ gives 
a plot which has a pronounced linear appearance (and, in fact, r° = .954 for the straight line fit). 


Ex’ =.082273, Dy; = 123.64, Lx!* =.00037813, Ly/* = 879.88 , Ex/y; =.57295 , from which 


B —3735.4485 and & =—10.2045 (values read from computer output). With x = 220, x’ =.004545 
so jp’ =—10.2045 +3735.4485(.004545) = 6.7748 and thus pe =875.50. 


For the transformed data, SSE = 1.39857, and n, =n, =n, =6, JY, =8.44695, y, = 6.83157, 

crm te ,  .02993/ : 
¥, =5.32891 , from which SSPE = 1.36594, SSLF = .02993, f IG = .33. Comparing this 
to the F distribution with df= (1, 15), it is clear that Hy cannot be rejected. 
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a. The accompanying scatterplot, left, shows a very strong non-linear association between the variables. 
The corresponding residual plot would look somewhat like a downward-facing parabola. 


b. The right scatterplot shows y versus 1/x and exhibits a much more linear pattern. We'd anticipate an r 
value very near | based on the plot. (In fact, “ = .998.) 


——T t — 
0.005 0.010 oo1s 0.020 oo2s 0.030 0.035 oon 


ia 


er 
2» » @ 0 100 1m 
x 


c. With the aid of software, a 95% PI for y when x = 100, aka x’ = 1/x = 1/100 = .01, can be generated. 
Using Minitab, the 95% PI is (83.89, 87.33). That is, at a 95% prediction level, the nitrogen extraction 
percentage for a single run when leaching time equals 100 h is between 83.89 and 87.33. 


V(Y) =V(ae™ -e)= [ ae” ] V(e)= ae" «1? where we have set V(z) = r’. If B> 0, this is an 


increasing function of x so we expect more spread in y for large x than for small x, while the situation is 
reversed if # < 0. It is important to realize that a scatter plot of data generated from this model will not 
spread out uniformly about the exponential regression function throughout the range of x values; the spread 
will only be uniform on the transformed scale. Similar results hold for the multiplicative power model. 


First, the test statistic for the hypotheses Hp: #, = 0 versus H,: f; # 0 is z= 4.58 with a corresponding P- 
value of .000, suggesting noise level has a highly statistically significant relationship with people’s 
perception of the acceptability of the work environment. The negative value indicates that the likelihood of 
finding work environment acceptable decreases as the noise level increases (not surprisingly). We estimate 
that a 1 dBA increase in noise level decreases the odds of finding the work environment acceptable by a 
multiplicative factor of .70 (95% CI: .60 to .81). 

eo” tin 23.2-359x 


The accompanying plot shows 7 = noe _ Notice that the estimate probability of finding 
+e” 


23.2-359x 


Ite 
work environment acceptable decreases as noise level, x, increases. 


Ob 
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Section 13.3 


27. 
a. A scatter plot of the data indicated a quadratic regression model might be appropriate. 


2 


b. $ =84.482-15.875(6)+1.7679(6)° = 52.88; residual = y, — 6 =53-52.88 =.12 


2 
c. SST = y? a2 ie 586.88, so R? =1-2L-/7_ = 895, 
n 586.88 


d. None of the standardized residuals exceeds 2 in magnitude, suggesting none of the observations are 
outliers. The ordered z percentiles needed for the normal probability plot are —1.53, —.89, —49, —.16, 
.16, 49, .89, and 1.53. The normal probability plot below does not exhibit any troublesome features. 


e. ly« = 52.88 (from b) and t 025.0 = tos. = 2.571 , 50 the Cl is 
52.88 + (2.571\1.69) = 52.88 + 4.34 = (48.54,57.22). 


at alt 2 77 
f. SSE=61.77,s0 s° =" - 12.35 and s{pred} 12.35 + (1.69) =3.90. The PIis 


52.88 + (2.571)3.90) = 52.88 + 10.03 = (42.85,62.91). 
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29. 
a. The table below displays the y-values, fits, and residuals. From this, SSE = ¥ e = 16.8, 
s’ = SSE((n — 3) = 4.2, and s = 2.05. 

y emmy =—¥ 
81 82.1342 —1.13420 
83 80.7771 2.22292 
79 79.8502 —0.85022 
75 72.8583 2.14174 
70 72.1567 2.15670 
43 43.6398 —0.63985 
22 21.5837 0.41630 


b. SST=¥(v-7)’ =¥ (y— 64.71)? = 3233.4, so R” = 1 — SSE/SST = 1 — 16.8/3233.4 = 995, or 99.5%. 
995% of the variation in free—flow can be explained by the quadratic regression relationship with 
viscosity. 


c. We want to test the hypotheses Ho: B = 0 v. H,: B2 # 0. Assuming all inference assumptions are met, 


the relevant ¢ statistic is ¢ = Bb swe te =-6.55. Atn —3 = 4 df, the corresponding P-value is 


.0004835 
2P(T > 6.55) < .004. At any reasonable significance level, we would reject Hp and conclude that the 


quadratic predictor indeed belongs in the regression model. 


d. Two intervals with at least 95% simultaneous confidence requires individual confidence equal to 
100% — 5%/2 = 97.5%. To use the f-table, round up to 98%: t.o1,4 = 3.747. The two confidence intervals 
are 2.1885 + 3.747(.4050) = (.671, 3.706) for 8, and —0031662 + 3.747(.0004835) = 
(—.00498, —.00135) for A>. [In fact, we are at least 96% confident f, and f, lie in these intervals.] 


e. Plug into the regression equation to get = 72.858. Then a 95% CI for pty 400 iS 72.858 + 3.747(1.198) 


= (69.531, 76.186). For the PI, s{pred} = Js? +s} = 42 + (1.198)? = 2.374, so a 95% PI for ¥ when 


x = 400 is 72.858 + 3.747(2.374) = (66.271, 79.446). 


31. 
a. R*=98.0% or .980. This means 98.0% of the observed variation in energy output can be attributed to 


the model relationship. 


2 a — _— 
b. Fora quadratic model, adjusted R= “ee = aan = .759, or 75.9%. (A more 
n- — — -_ | 


precise answer, from software, is 75.95%.) The adjusted R’ value for the cubic model is 97.7%, as seen 
in the output. This suggests that the cubic term greatly improves the model: the cost of adding an extra | 
parameter is more than compensated for by the improved fit. 

| 


c. To test the utility of the cubic term, the hypotheses are Hp: B3 = 0 versus Ho: 8; # 0. From the Minitab 
output, the test statistic is t= 14.18 with a P-value of .000. We strongly reject Hy and conclude that the | 
cubic term is a statistically significant predictor of energy output, even in the presence of the lower 


terms. 


d. Plug x = 30 into the cubic estimated model equation to get p = 6.44. From software, a 95% Cl for yao 
is (6.31, 6.57). Alternatively, ¥ + ¢25,208¥ = 6.44 + 2.086(.0611) also gives (6.31, 6.57). Next, a 95% PI ) 
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for ¥:30 is (6.06, 6.81) from software. Or, using the information provided, ) + ¢o25,20 Is? +5; - 


6.44 + 2.086 ,/(.1684)° +(.0611)’ also gives (6.06, 6.81). The value of s comes from the Minitab 
output, where s = .168354. 


e. The null hypothesis states that the true mean energy output when the temperature difference is 35°K is 
equal to SW; the alternative hypothesis says this isn’t true. 
Plug x = 35 into the cubic regression equation to get p = 4.709. Then the test statistic is 
_ 4.709-5 | 
.0523 
strongly reject Hp (in particular, .000 < .05) and conclude that fty35 # 5. 


-5.6, and the two-tailed P-value at df = 20 is approximately 2(.000) = .000. Hence, we 


Alternatively, software or direct calculation provides a 95% CI for “yas of (4.60, 4.82). Since this CI 
does not include 5, we can reject Hp at the .05 level. 


33. 


x-20 
10.8012 
4629, so } =.9671-.0502(.4629) -.0176(.4629)' +.0062(.4629)' =.9407.. 


b. 5 =,9671-0502) at }-or74{ x20 +0062 x= 20 
10.8012 10.8012 10.8012 


= 000004927 —.000446058x" +.007290688x +.96034944 , 


a. x=20 ands,=10.8012 so x= . For x = 20, x’=0, and y= B =.9671. Forx =25,x'= 


c. Tada =2.00. At df = n — 4 = 3, the P-value is 2(.070) = .140 > .05, Therefore, we cannot reject Ho; 


~ 0031 
the cubic term should be deleted. 


d. SSE=<(y,-3, y and the },’s are the same from the standardized as from the unstandardized model, 
so SSE, SST, and R’ will be identical for the two models. 


e. Ly? =6.355538, Ly, = 6.664, so SST = .011410. For the quadratic model, R® = .987, and for the 


cubic model, R’ = .994. The two R’ values are very close, suggesting intuitively that the cubic term is 
relatively unimportant. 


35. Y'=In(Y)=Ina+fx+yx* +In(e)= f+ hx+B,x +e" where e'=In(s), , =In(a@), B=, and 
8, =7. That is, we should fit a quadratic to (x, In(y)). The resulting estimated quadratic (from computer 


output) is 2.00397 +.1799.x —.0022x" , so B=.1799, 7 =-.0022, and @=e*” = 7.6883. [The In(y)’s 


are 3.6136, 4.2499, 4.6977, 5.1773, and 5.4189, and the summary quantities can then be computed as 
before. ] 
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The mean value of y when x; = 50 and x) =3 is 1,.9, =—-800 +.060(50)+.900(3)=4.9 hours. 


When the number of deliveries (x) is held fixed, then average change in travel time associated with a 
one-mile (i.e., one unit) increase in distance traveled (x;) is .060 hours. Similarly, when distance 
traveled (x;) is held fixed, then the average change in travel time associated with on extra delivery (i.e., 
a one unit increase in x2) is .900 hours. 


Under the assumption that Y follows a normal distribution, the mean and standard deviation of this 
distribution are 4.9 (because x, = 50 and x; =3) and o =.5 (since the standard deviation is assumed to 
be constant regardless of the values of x, and x2). Therefore, 


P(Y¥ <6)= az < =2) = P(Z <2.20)=.9861. That is, in the long run, about 98.6% of all days 


will result in a travel time of at most 6 hours. 


For x, = 2, x2 = 8 (remember the units of x2 are in 1000s), and x3 = 1 (since the outlet has a drive-up 
window), the average sales are }=10.00-1.2(2)+ 6.8(8)+15.3(1)=77.3 (i.e., $77,300). 


For x; = 3, x. = 5, and x; = 0 the average sales are  =10.00-1 .2(3)+6.8(5)+ 15.3(0)=40.4 (ie., 
$40,400). 


When the number of competing outlets (x,) and the number of people within a 1-mile radius (x2) 
remain fixed, the expected sales will increase by $15,300 when an outlet has a drive-up window. 


R? = .834 means that 83.4% of the total variation in cone cell packing density (v) can be explained by a 
linear regression on eccentricity (x,) and axial length (x2). For Ho: B, = B: = 0 vs. Hy: at least one f #0, 

R* /k r .834/2 
(I—R?)/(n-k-1)  (1-.834)/(192-2-1) 
at df = (2, 189) is essentially 0. Hence, Mp is rejected and the model is judged useful. 


the test statistic is F = = 475, and the associated P-value 


$f = 35821.792 — 6294.729(1) — 348.037(25) = 20,826. 138 cells/mm’. 


For a fixed axial length (x2), a 1-mm increase in eccentricity is associated with an estimated decrease in 
mean/predicted cell density of 6294.729 cells/mm’. 


The error df =n —k—-1 = 192 —3 = 189, so the critical CI value is ¢o2s,139 ~ Z.o2s = 1.96. A 95% Cl for 
By is -6294.729 + 1.96(203.702) = (-6694.020, —5895.438). 


The test statistic is t= eee =_2.59: at 189 df, the 2-tailed P-value is roughly 2P(T < -2.59) 
= 2¢(—2.59) = 2(.0048) ~ .01. Since .01 <.05, we reject Ho. After adjusting for the effect of 
eccentricity (x;), there is a statistically significant relationship between axial length (x2) and cell 
density (y). Therefore, we should retain x, in the model. 
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J =185.49 -45.97(2.6) —0.3015(250) + 0.0888(2.6)(250) = 48.313. 


No, it is not legitimate to interpret f; in this way. It is not possible to increase the cobalt content, x), 
while keeping the interaction predictor, x3, fixed. When x, changes, so does x3, since x3 = X)X2. 


Yes, there appears to be a useful linear relationship between y and the predictors. We determine this 
by observing that the P-value corresponding to the model utility test is < .0001 (F test statistic = 
18.924). 


We wish to test Ho: 8; = 0 vs. H,: 8; #0. The test statistic is t = 3.496, with a corresponding P-value of 
.0030. Since the P-value is < a = .01, we reject Hp and conclude that the interaction predictor does 
provide useful information about y. 


A 95% CI for the mean value of surface area under the stated circumstances requires the following 
quantities: § = 185.49 — 45.97(2)—0.3015(500) + 0.0888(2)(500) = 31.598. Next, ty), =2.120, so 


the 95% confidence interval is 31.598 +(2. 120)(4.69) = 31.598 + 9.9428 =(21.6552,41 5408). 


The hypotheses are Hp: 6; = 82 = Bs = 8, = 0 vs. H,: at least one f; #0. The test statistic is f= 

R? /k ___-946/4 
(I-R?)n-k-1)  (1—.946)/20 
Table A.9), so the P-value is < .001 and we can reject Hp at any significance level. We conclude that 
at least one of the four predictor variables appears to provide useful information about tenacity. 


= 87.6 > F 90),4,20 = 7.10 (the smallest available F-value from 


The adjusted R’ value is 1” <3 & n-1 24 


pat D IE IY, sk es PEL SACU | gy a) VOM metal P = ,935, which does 
n—(k+1)\ SST ari ) 20 Se) a 
not differ much from R? = .946. 


The estimated average tenacity when x, = 16.5, x. = 50, x; = 3, and x; =5 is 
J = 6.121 —.082(16.5) + .113(50) + .256(3)—.219(5)= 10.091. Fora 99% Cl, tags 49 = 2.845, $0 


the interval is 10.091 + 2.845(.350) = (9.095,11.087). Therefore, when the four predictors are as 
specified in this problem, the true average tenacity is estimated to be between 9.095 and 11.087. 


For a 1% increase in the percentage plastics, we would expect a 28.9 kcal/kg increase in energy 
content. Also, fora 1% increase in the moisture, we would expect a 37.4 kcal/kg decrease in energy 
content. Both of these assume we have accounted for the linear effects of the other three variables. 


The appropriate hypotheses are Ho: £, = 8, = 8, = 8, =0 vs. H,: at least one 8 # 0. The value of the F- 


test statistic is 167.71, with a corresponding P-value that is ~ 0. So, we reject Hy and conclude that at 
least one of the four predictors is useful in predicting energy content, using a linear model. 


Ho: B; = 0 v. Hy: 8; #0. The value of the t test statistic is t= 2.24, with a corresponding P-value of 
.034, which is less than the significance level of .05. So we can reject Hy and conclude that percentage 
garbage provides useful information about energy consumption, given that the other three predictors 
remain in the model. 
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jp = 2244.9 + 28.925 (20) + 7.644(25)+ 4.297(40) —37.354(45) = 1505.5 , and t2s,25 = 2.060. Soa 


95% CI for the true average energy content under these circumstances is 
1505.5 + (2.060\(12.47) = 1505.5 + 25.69 = (1479,8,1531.1). Because the interval is reasonably narrow, 


we would conclude that the mean energy content has been precisely estimated. 


A 95% prediction interval for the energy content of a waste sample having the specified characteristics 


is 1505.5+(2.060),/(31.48) +(12.47) =1505.5 + 69.75 = (1435.7,1575.2). 


Use the ANOVA table in the output to test Ho: 8; = 82 = Bs = 0 vs. H,, at least one £; # 0. With f= 
17.31 and P-value = 0.000, so we reject Hp at any reasonable significance level and conclude that the 
model is useful. 


Use the # test information associated with x; to test Ho: 83 = 0 vs. H.: Bs #9. With ¢ = 3.96 and P-value 
= 002 <.05, we reject Hy at the .05 level and conclude that the interaction term should be retained. 


The predicted value of y when x, = 3 and x, = 6 is ¥ = 17.279 — 6.368(3) - 3.658(6) + 1.7067(3)(6) = 
6.946. With error df= 11, togs,1; = 2.201, and the Cl is 6.946 + 2.201(.555) = (5.73, 8.17). 


Our point prediction remains the same, but the SE is now Js’ +58; = ¥1.72225° +.555° = 1.809. The 
resulting 95% PI is 6.946 + 2.201(1.809) = (2.97, 10.93). 


Associated with x; = drilling depth are the test statistic t= 0.30 and P-value = .777, so we certainly do 
not reject Ho: £; = 0 at any reasonably significance level. Thus, we should remove x3 from the model. 


R/k 836 /2 
To test Ho: 6; =f) = 0 vs. Hy: at least one £ #0, eR f=-<—————— oT 
oi Pa = Pa Ones Hat leas one ints f= GIR (n-k=1) 0 836)/ 9-2-1) 


= 15.29: at df = (2, 6), 10.92 < 15.29 < 27.00 > the P-value is between .001 and .01. (Software gives 
.004.) In particular, P-value < .05 => reject Ho at the a = .05 level: the model based on x; and x2 is 
useful in predicting y. 


With error df = 6, to2s5 = 2-447, and from the Minitab output we can construct a 95% Cl for By: 
~0.006767 + 2.447(0.002055) = (-0.01180, -0.001 74). Hence, after adjusting for feed rate (x2), we are 
95% confident that the true change in mean surface roughness associated with a Irpm increase in 
spindle speed is between —.01 180 um and —.00174 jum. 


The point estimate is = 0.365 —0.006767(400) + 45.67(.125) = 3.367. With the standard error 
provided, the 95% CI for wy is 3.367 + 2.447(.180) = (2.93, 3.81). 


A normal probability plot of the e* values is quite straight, supporting the assumption of normally 


distributed errors. Also, plots of the e* values against x, and x, show no discernible pattern, supporting 
the assumptions of linearity and equal variance. Together, these validate the regression model. 
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Some possible questions might be: 


(i) Is this model useful in predicting deposition of poly-aromatic hydrocarbons? A test of model 
utility gives us an F = 84.39, with a P-value of 0.000. Thus, the model is useful. 

(2) Is x, a significant predictor of y in the presence of x2? A test of Ho: B, = 0 v. H,: 8, #0 gives us af 
= 6.98 with a P-value of 0.000, so this predictor is significant. 

(3) A similar question, and solution for testing x) as a predictor yields a similar conclusion: with a P- 
value of 0.046, we would accept this predictor as significant if our significance level were anything 
larger than 0.046. 


Section 13.5 


uw 


a. To test Ho: B; =.= = 0 vs. H,: at least one f # 0, use R’: 
ft ba a - ELA 2 = 6.40. At df = (3, 8), 4.06 < 6.40 < 7.59 = the P- 
(1-R’*)/(n—k-1) (1—.706)/ (12-3-1) 
value is between .05 and .01. In particular, P-value < .05 => reject Hy at the .O5S level. We conclude that 
the given model is statistically useful for predicting tool productivity. 


b. No: the large P-value (.510) associated with In(x3) implies that we should not reject Ho: 6; = 0, and 
hence we need not retain In(x;) in the model that already includes In(x;). 


c. Part of the Minitab output from regression In(y) on In(x,) appears below. The estimated regression 
equation is In(y) = 3.55 + 0.844 In(x;). As for utility, = 4.69 and P-value = .001 imply that we should 
reject Hy: 6; = 0 — the stated model is useful. 


The regression equation is 
In(y) = 3.55 + 0.844 In(x1) 


Predictor Coef SE Coef T P 
Constant 3.55493 0.01336 266.06 0.000 
in(x1) 0.8439 0.1799 4.69 0.001 


d. The residual plot shows pronounced curvature, rather than “random scatter.” This suggests that the 
functional form of the relationship might not be correctly modeled — that is, In(v) might have a non- 
linear relationship with In(x;). [Obviously, one should investigate this further, rather than blindly 
continuing with the given model!] 


Residuals Versus ta(x1) 


(response ts in(y)) 


Stendard@ed Resxtual 


10s 0.00 00s 0.10 
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e. First, for the model utility test of In(x,) and In’(x;) as predictors, we again rely on R: 
fa R*/k a 819/2 
(1—R*)/(n—k-1) (1-.819)/(12-2-1) 
P-value is < .001 and we strongly reject the null hypothesis of no model utility (i.e., the utility of this 
model is confirmed). Notice also the P-value associated with In?(x,) is .031, indicating that this 
“quadratic” term adds to the model. 
Next, notice that when x, = 1, In(x,) = 0 [and In°(x,) = 0° = 0], so we're really looking at the 
information associated with the intercept. Using that plus the critical value t925,9 = 2.262, a 95% PI for 
the response, In(Y), when x; = 1 is 3.5189 + 2.262 0361358? +.0178? = (3.4277, 3.6099). Lastly, to 
create a 95% PI for ¥ itself, exponentiate the endpoints: at the 95% prediction level, a new value of Y 
when x; = 1 will fall in the interval (e°“?”, e*®”) = (30.81, 36.97). 


= 20.36. Since this is greater than F 991,29 = 16.39, the 


57. 
k R R: C, Ni? +2(k+1)—a 
Ss 
l .676 647 138.2 
2 .979 975 pa 
3 9819 .976 3.2 
4 9824 4 
where s* = 5.9825 
a. Clearly the model with k = 2 is recommended on all counts. 
b. No. Forward selection would let x, enter first and would not delete it at the next stage. 
59, 


a. The choice of a “best” model seems reasonably clear-cut. The model with 4 variables including all but 
the summerwood fiber variable would seem best. R’ is as large as any of the models, including the 5- 
variable model. R° adjusted is at its maximum and CP is at its minimum. Asa second choice, one 
might consider the model with k = 3 which excludes the summerwood fiber and springwood % 
variables. 


b. Backwards Stepping: 


Step 1: A model with all 5 variables is fit; the smallest ¢-ratio is ¢= .12, associated with variable x2 
(summerwood fiber %). Since t= 12 <2, the variable x, was eliminated. 

Step 2: A model with all variables except x2 was fit, Variable x, (springwood light absorption) has 
the smallest -ratio (t = —1.76), whose magnitude is smaller than 2. Therefore, x4 is the next 
variable to be eliminated. 

Step 3: A model with variables x; and xs is fit. Both f-ratios have magnitudes that exceed 2, so both 
variables are kept and the backwards stepping procedure stops at this step. The final model 
identified by the backwards stepping method is the one containing x; and xs. 
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Forward Stepping: 


Step 1: After fitting all five 1-variable models, the model with x; had the ¢-ratio with the largest 
magnitude (t= -4.82). Because the absolute value of this f-ratio exceeds 2, x; was the first 
variable to enter the model. 

Step 2: All four 2-variable models that include x; were fit. That is, the models {23, x}, {x3, x2}, 

{x3, X4}, {x3, xs} were all fit. Of all 4 models, the ratio 2.12 (for variable xs) was largest in 
absolute value. Because this ¢-ratio exceeds 2, x; is the next variable to enter the model. 

Step 3: (not printed): All possible 3-variable models involving x; and xs and another predictor. None 
of the t-ratios for the added variables has absolute values that exceed 2, so no more variables are 
added. There is no need to print anything in this case, so the results of these tests are not shown. 


Note: Both the forwards and backwards stepping methods arrived at the same final model, {x3, xs}, in 
this problem. This often happens, but not always. There are cases when the different stepwise 
methods will arrive at slightly different collections of predictor variables. 


If multicollinearity were present, at least one of the four R’ values would be very close to 1, which is not 
the case. Therefore, we conclude that multicollinearity is not a problem in this data. 


Before removing any observations, we should investigate their source (¢.g., were measurements on that 
observation misread?) and their impact on the regression. To begin, Observation #7 deviates significantly 
from the pattern of the rest of the data (standardized residual = —2.62); if there’s concern the PAH 
deposition was not measured properly, we might consider removing that point to improve the overall fit. If 
the observation was not mis—recorded, we should not remove the point. 


We should also investigate Observation #6: Minitab gives og = .846 > 3(2+1)/17, indicating this 
observation has very high leverage. However, the standardized residual for #6 is not large, suggesting that 
it follows the regression pattern specified by the other observations. Its “influence” only comes from 
having a comparatively large x, value. 
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Supplementary Exercises 


65. 
a. 


Boxplots of ppv by prism quality 


(means are indicated by solid circles) 


700 - 


ppy 


20-4 | 


cracked not cracked 


prism qualilty 


A two-sample t confidence interval, generated by Minitab: 
Two sample T for ppv 


prism qu N StDev SE Mean 
cracked 22 295 85 
not cracke 18 234 55 
95% CI for mu (cracked ) mu (not cracke): ( 132, 557) 


b. The simple linear regression results in a significant model, r is .577, but we have an extreme 
observation, with std resid =—4.11. Minitab output is below. Also run, but not included here was a 
model with an indicator for cracked/ not cracked, and for a model with the indicator and an interaction 
term. Neither improved the fit significantly. 


The regression equation 1s 
ratio = 1.00 -0.000018 ppv 


Predictor Coef StDev T P 
Constant 1.00161 0.00204 491,18 0.900 
PPV -0.00001827 0.00000295 -§.19 0.000 
S = 0.004892 R-Sq = 57.7% R-Sq(adj) = 56.2% 


i Analysis of Variance 


Source DF ss MS P 
Regression 1 0.00091571 0.00091571 38.26 0. 00C 
Residual Error 28 0.00067016 0.00002393 

Total 29 0.00158587 


Unusual Observations 


Obs ppv ratio Fit StDev Fit Residua t Resid 
29 1144 0.962000 0.980704 0.001786 0.018704 -4.11R 
R denotes an observation with a large standardized residua+ 
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a. After accounting for all the other variables in the regression, we would expect the VO.max to decrease 
by .0996, on average for each one-minute increase in the one-mile walk time. 


b. After accounting for all the other variables in the regression, we expect males to have a VO,max that is 
.6566 L/min higher than females, on average. 


c. jp =3.5959+.6566(1)+.0096(1 70) —.0996(1 1) ~.0880(140) =3.67. The residual is 
y =(3.15-3.67)=-52. 
d. R*=1 a l ee =.706, or 70.6% of the observed variation in VO,max can be attributed to 


“SST 102.3922 
the model relationship. 


e. To test Hy: £, =f,=f;= By =0 vs. H,: at least one 8 # 0, use R’: 
pin R’ /k x .706/4 
zs (1-R?)/(n-k-1)  (1-.706)/(20-4-1) 
less than .05, so Hp is rejected. It appears that the model specifies a useful relationship between 
VO>max and at least one of the other predictors. 


= 9.005. At df= (4, 15), 9.005 > 8.25 = the P-value is 


a. Based ona scatter plot (below), a simple linear regression model would not be appropriate. Because of 
the slight, but obvious curvature, a quadratic model would probably be more appropriate. 


b. Using a quadratic model, a Minitab generated regression equation is 
= 35.423 +1.7191x~.0024753x° , and a point estimate of temperature when pressure is 200 is 
= 280.23. Minitab will also generate a 95% prediction interval of (256.25, 304.22). That is, we are 


confident that when pressure is 200 psi, a single value of temperature will be between 256.25 and 
304.22°F. 


y 
y 
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Pa 

a. Using Minitab to generate the first order regression model, we test the model utility (to see if any of 
the predictors are useful), and with f= 21.03and a P-value of .000, we determine that at least one of the 
predictors is useful in predicting palladium content. Looking at the individual predictors, the P-value 
associated with the pH predictor has value .169, which would indicate that this predictor is 
unimportant in the presence of the others. 


b. We wish to test Ho: 8, =...= By =0 vs. H,: at least one # 4 0. With calculated statistic f= 6.29 and P- 
value .002, this model is also useful at any reasonable significance level. 


c. Testing Hy : Bi, =...= By =0 vs. H,: at least one of the listed f’s #0, the test statistic is 


(716.10 — 290.27) (20 —5) : : 

= ——______————— = 1.07 <F = 2.72. Thus, P-value > .05, so we fail to reject H 

f 290.27(32 20-1) 05,15,11 fail J 0 
and conclude that all the quadratic and interaction terms should not be included in the model. They do 
not add enough information to make this model significantly better than the simple first order model. 


d. Partial output from Minitab follows, which shows all predictors as significant at level .05: 
The regression equation is 
pdconc = - 305 + 0.405 niconc + 69.3 pH - 0.161 temp + 0.993 currdens 
+ 0.355 pallcont - 4.14 pHsq 


Predictor Coef StDev by P 
Constant -304.85 93.98 -3.24 
niconc 0.40484 0.09432 4.29 
pH 69.27 21.96 3.15 
temp ~0.16134 0.07055 -2.29 
currdens 0.9929 0.3570 2.78 
pallcont 0.35460 0.03381 10.49 
pHsq -4.138 1.293 -3.20 


oooooccoco 
Qo 
Ww 


73. 
‘ .29 
a. We wish to test Hp: f, = 4 = 0 vs. H,: either f, or f, # 0. With R? =1 a = .9986 , the test 


2 
statistic is f = ( wis stead = 1783, where k = 2 for the quadratic model. 


[-R*)/(n—k—1)  (1—.9986)/(8—2-1) 
Clearly the P-value at df= (2,5) is effectively zero, so we strongly reject Ho and conclude that the 
quadratic model is clearly useful. 


b. The relevant hypotheses are H,: 8, = 0 vs. Hy: fp) #0. The test statistic value is 


»— Br — ~-00163141-0 __ 4g 1 at 5 df, the P-value is 2P(T> |48.1|) ~ 0. Therefore, Ho is rejected. 


Si .00003391 


The quadratic predictor should be retained. 


predictor would be essentially nil — and a scatter plot doesn’t show the type of curvature associated 


c. No. R’is extremely high for the quadratic model, so the marginal benefit of including the cubic 
with a cubic model. 


d. toos.5 = 2.571, and fy + B,(100)+ B, (100)? = 21.36 , so the CI is 21.36 + 2.571(.1141) = 21.36 + .29 
= (21.07,21.65). 
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First, we need to figure out s* based on the information we have been given: s’ = MSE = SSE/df = 
.29/5 = .058. Then, the 95% PI is 21.36+ 2.571y.058 +(.1141)? =21.36+0.685 = (20.675,22.045). 


To test Ho: £; = B, = 0 vs. H,: either f, or 2, #0, first find R*: SST = Sy? -(Zy)?/n = 264.5 => R= 
/ 
| — SSE/SST = 1 - 26,98/264.5 = 898. Next, f =_——298/?_ 
(1—.898)/(10—2-1) 
corresponds to a P-value of = 0. Thus, Hp is rejected at significance level .01 and the quadratic model 
is judged useful. 


= 30.8, which at df= (2,7) 


The hypotheses are Hy: 8, = 0 vs. H,: 82 #0. The test statistic value is t= (-2.3621 — 0.3073 =-7.69, 
and at 7 df the P-value is 2P(T > |-7.69|) ~ 0. So, Hp is rejected at level .001. The quadratic predictor 
should not be eliminated. 


x=There, #,, =f, +8,(1)+A,(1) =45.96, and tos. = 1.895, giving the CI 
45.96 + (1.895 \(1.031)=(44.01,47.91). 


The hypotheses are Ho: 8, = fo = £; = 8, = 0 versus H,: at least one f; # 0. From the output, the F- 
statistic is f= 4.06 with a P-value of .029. Thus, at the .05 level we reject Hy and conclude that at least 
one of the explanatory variables is a significant predictor of power. 


Yes, a model with R’ = .834 would appear to be useful. A formal model utility test can be performed: 
27 
f= POU AN = EA a! ke = 20.1, which is much greater than F’953,12 = 3.49. Thus, 
(1—R°)/[n-(k+))] (—.834)/[16—4] 


the mode including {x3, x4, x3x4} is useful. 


We cannot use an F test to compare this model with the first-order model in (a), because neither model 
is a “subset” of the other. Compare {x;, X2, X3, X4} tO {X3, X4, X3%4}. 


The hypotheses are Hp: Bs = ... = Bio = 0 versus H,: at least one of these f; 4 0, where fs through Ajo are 
the coefficients for the six interaction terms. The “partial F test” statistic is 


f= (SSE, -SSE,)/(k-1) _ (Ri-R')/(k-D) _ (.960—.596)/ (10-4) 
a SSE, /[n—-—(k+)] (l= R?)/[n-(k +] ~ (1—.960)/[16—(10+1)] 
than F 95,65 = 4.95. Hence, we reject Hp at the .05 level and conclude that at least one of the interaction 
terms is a statistically significant predictor of power, in the presence of the first-order terms. 


= 7.58, which is greater 


There are obviously several reasonable choices in each case. In a, the model with 6 carriers is a defensible 
choice on all three grounds, as are those with 7 and 8 carriers. The models with 7, 8, or 9 carriers in b 


merit serious consideration. These models merit consideration because R i , MSE, and C; meet the variable 
selection criteria given in Section 13.5. 


The relevant hypotheses are Hy : f, =...= 8; =0 vs. H,: at least one among f;,...,8; #9. f= 
.827/5 ee ' : 
173/11 = 106.1 > F5.5,1;1 = 2.29, so P-value < .05. Hence, Hp is rejected in favor of the conclusion 


that there is a useful linear relationship between Y and at least one of the predictors. 
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b. fo5111 =1.66, so the Cl is 041 +(1.66Y.016) =.041 +.027 = (.014,.068). f; is the expected change in 


mortality rate associated with a one-unit increase in the particle reading when the other four predictors 
are held fixed; we can be 90% confident that .014 < f, < .068. 
B,-0 _ .047 


= 5,9, with an associated P-value of ~ 0. So, 
007 


c. Intesting Ho: 8, = 0 versus H,: fy #0, t= 


Hy is rejected and this predictor is judged important. 


d. } =19.607 +.041(166)+.071(60)+.001(788)+.041(68)+ 687(.95)= 99.514, and the corresponding residual is 
103 — 99.514 = 3.486. 


Taking logs, the regression model is In( Y) = Bo + Bi In(x,) + Bz In@e2) + e', where fo = In(a). Relevant 
Minitab output appears below. 
a. From the output, B, = 10.8764, = ~1.2060, B, = —1,3988. In the original model, solving for a returns 


& =exp(f,) =e = 52,912.77. 


b. From the output, R’ = 78.2%, so 78.2% of the total variation in In(wear life) can be explained bya 
linear regression on In(speed) and In(load). From the ANOVA table, a test of Ho: B, = Bz = 0 versus 
H,; at least one of these f’s # 0 produces f= 42.95 and P-value = 0.000, so we strongly reject Ho and 
conclude that the model is useful. 


c. Yes: the variability utility tests for the two variables have t= ~7.05, P = 0.000 and «= -6.01, P= 
0.000. These indicate that each variable is highly statistically significant. 

d. With In(50) ~ 3.912 and In(5) ~ 1.609 substituted for the transformed x values, Minitab produced the 
accompanying output. A 95% PI for in(Y) at those settings is (2.652, 5.162). Solving for Y itself, the 
95% PI of interest is (e”*”, e°'*) = (14.18, 174.51). 


The regression equation is 
In(y) = 10.9 - 1.21 In(xl) - 1.40 1n(x2) 


Predictor Coef SE Coef BY P 
Constant 10.8764 0.7872 13.82 0.000 
in (x1) -1.2060 0.1710 -7.05 0.000 
in (x2) -1.3988 0.2327 -6.01 0.000 
S = 0.596553 R-Sq = 78.2% R-Sq(adj) = 76.3% 


Analysis of Variance 


Source DF ss MS F P 

Regression 2 30.568 15.284 42.95 0.000 

Residual Error 24 8.541 0.356 

Total 26 39.109 

Predicted Values for New Observations 

New Obs Fit SE Fit 95% CI 95% PI 
13.907 0.1178 (3.663, 4.151) (2.652, .162) 
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Section 14.1 


wn 


For each part, we reject H if the P-value is < a, which occurs if and only the calculated 7 value is greater 
than or equal to the value z?,_, from Table A.7. 
a. Since 12.25> 7;,, =9.488 , P-value < .05 and we would reject Hp. 


b. Since 8.54 < y?,, =11.344 , P-value > .01 and we would fail to reject Hp. 
c. Since 4.36 < y},, =4.605, P-value > .10 and we would fail to reject Ho. 


d. Since 10.20 < yj, =15.085 , P-value > .01 we would fail to reject Hp. 


The uniform hypothesis implies that p,,=+=.125 fori= 1, ..., 8, so the null hypothesis is 
Hy, : Pio = Pop =: = Py =-125. Each expected count is npio = 120(.125) = 15, so 


7_ -\2 ry 2 
(12-15), (10-15) 
15 15 


reject Hy. There is not enough evidence to disprove the claim. 


=4.80. Atdf=8—1=7, 4.80 < 12.10 = P-value > .10 => we fail to 


The observed values, expected values, and corresponding x" terms are : 


A 15 23 25 38 21 32 14 10 8 
6:67, 15:33. 20 26.67 33.33 33.33 26.67 20 13.33 6.67 


1.069 .209 450 105 .654 .163 1.065 1.800 .832 .265 


¢ = 1.069 + ... + .265 = 6.612. With df= 10 — 1 = 9, 6.612 < 14.68 = P-value > .10 = we cannot reject 
Ho. There is no significant evidence that the data is not consistent with the previously determined 
proportions. 


We test H, : p, = p, = Py = Py =-25 Vs. H,: at least one proportion # .25, and df= 3. 


Cell | 2 3 4 
Observed 328 334 372 327 
Expected 340.25 340.25 340.25 34.025 

7% term 4410 1148 2.9627 5160 


7° = 4.0345, and with 3 df, P-value > .10, so we fail to reject Ho. The data fails to indicate a seasonal 
relationship with incidence of violent crime. 
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a. Denoting the 5 intervals by [0, ¢)), [c1, C2), «-+5 [C4, 0), we wish ¢ for which 


WP P(O< xX Sc,)= [eta =1-e™ , soc, =-In(.8) = .2231. Then 


2=P(c,< X $c,)>.4=P(0S X, Sc,)=1-e%, 80 cy = -In(.6) = 5108. Similarly, c3 = —In(.4) = 
0163 and cy =—In(.2) = 1.6094. The resulting intervals are [0, .2231), [.2231, 5108), [.5108, .9163), 
[.9163, 1.6094), and [1.6094, 00). 


b. Each expected cell count is 40(.2) = 8, and the observed cell counts are 6, 8, 10, 7, and 9, so 


2 (6-8) (9 -8)' 2 
X= ee tin. Bika =1,25. Because 1.25 < 7}, = 7.779, even at level .10 Hy cannot be 


rejected; the data is quite consistent with the specified exponential distribution, 


il. 
a. The six intervals must be symmetric about 0, so denote the 4, 5" and 6" intervals by [0, a), [a. 5), 


[b, 0). The constant a must be such that (a) =.6667(4 ++), which from Table A.3 gives a= 43. 
Similarly, ®(b) = .8333 implies b ~ 97, so the six intervals are (—0 , -.97), [-.97, —.43), [-.43, 9), 
[0, .43), [.43, .97), and [.97, 2). 


b. The six intervals are symmetric about the mean of .5. From a, the fourth interval should extend from 
the mean to .43 standard deviations above the mean, i.e., from .5 to .5 + .43(.002), which gives 
[.5, .50086). Thus the third interval is [.5 — .00086, .5) = [.49914, 5). Similarly, the upper endpoint of 
the fifth interval is .5 + .97(.002) = .50194, and the lower endpoint of the second interval is .5 — .00194 
= 49806. The resulting intervals are (—, 49806), [.49806, .49914), [.49914, .5), [.5, 50086), 


[.50086, .50194), and [.50194, 2). 


c. Each expected count is 45(1/6) = 7,5, and the observed counts are 13, 6, 6, 8, 7, and 5, so x¢ = 5.53. 
With 5 df, the P-value > .10, so we would fail to reject Ho at any of the usual levels of significance. 
There is no significant evidence to suggest that the bolt diameters are not normally distributed with 
p= .5 ando = .002. 


Section 14.2 


13. According to the stated model, the three cell probabilities are (1 - p)’, 2p(1 —p), and p’, so we wish the 
value of p which maximizes (1—p es [2p(1 - p)|" p’™ . Proceeding as in Example 14.6 gives 
Ms a = =o, — 0843. The estimated expected cell counts are then n(l = py = 1163.85, 
n 2 


n[ 2p(1 ~ py} = 214.29, np’ =9.86. This gives 


p= 


2 


2 | PZT {lb 221629). 68-986) | _ 590.3. with df=4— 1-1 =2, 280.3 > 13.81 
1163.85 214.29 9.86 


—5 P-value < .001 = Hy is soundly rejected. The stated model is strongly contradicted by the data. 
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Chapter 14: Goodness-of-Fit Tests and Categorical Data Analysis 


The part of the likelihood involving @ is (i -6)' if lot - oy} lo (1-0) P 
loa -o)f* -[o* Jo =arer2mssnerdns (| g)ime3et2mem — 9233 (1 9)*” , so the log-likelihood is 
233 In @ +367 In(I—@). Differentiating and equating to 0 yields 6= = = .3883, and ( - 6)= 6117 


[note that the exponent on @ is simply the total # of successes (defectives here) in the n = 4(150) = 600 


trials]. Substituting this @ into the formula for p; yields estimated cell probabilities .1400, 3555, .3385, 
.1433, and .0227. Multiplication by 150 yields the estimated expected cell counts are 21.00, 53.33, 50.78, 
21.50, and 3.41. the last estimated expected cell count is less than 5, so we combine the last two categories 
into a single one (> 3 defectives), yielding estimated counts 21.00, 53.33, 50.78, 24.91, observed counts 26, 


51, 47, 26, and y* =1.62. With df= 4-1-1 =2, sincel.62 < X02 = 4.605 , the P-value > .10, and we 
do not reject Hp. The data suggests that the stated binomial distribution is plausible. 


(0)(6)+(1)(24)+(2)(42) +... +(8)(6) +(9)(2) 


i=x= = oo = 3.88 , so the estimated cell probabilities 


300 
are computed from p =e >** ea 
x: 


np(x) 6.2 24.0: 46.6") "GOS? 9 58:5) VASA 2954) 6.3 13.3 
obs 6 24 42 59 62 44 4) 14 8 


This gives 77 = 7.789. Atdf=9 — 1 — 1 =7, 7.789 < 12.01 => P-value > .10 => we fail to reject Ho. The 
Poisson model does provide a good fit. 


With A = 27, + ny + ns, B=2nz + ng + ng, and C = 2n; + ns + ng, the likelihood is proportional to 


008 (1-0, -8, )°. Taking the natural log and equating both ate and ee to zero gives a 
fale) 08, 6 1-6-8, 
B Cc B86, . A 
and — =—————., whence 6, = —. Substituting this into the first equation gives 6, =————— 
0, 1-0-0, aes” Wag pi ree mie Te 
- Qn+n, +n, » 2n,+n, +n - 4 2n, +n, +n 
hen he ae oi This Belg a Gf) = Ss 
£6, A+B+C ig 2n @, n and (1 G, 6,) on 


2(49)+20453 : 
20) tS 95. Gwe 2 2750, and 
> ~ 400 


Substituting the observed n,’s yields 6, = 
(1-6, -6,)=.2975, from which p, =(.4275)° =.183, p, =.076, p, =.089, p, = 2(.4275)(275)=.235, 


p, =.254, p, =.164. 


Category 
np 36.6 15.2 17.8 47.0 50.8 32.8 
observed 49 26 14 20 53 38 


This gives ¥° = 29.1. Atdf=6— 1 —2 =3, this gives a P-value less than .001. Hence, we reject Ho. 
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23. 


Chapter 14: Goodness-of-Fit Tests and Categorical Data Analysis 


The Ryan-Joiner test P-value is larger than .10, so we conclude that the null hypothesis of normality cannot 
be rejected. This data could reasonably have come from a normal population. This means that it would be 
legitimate to use a one-sample ¢ test to test hypotheses about the true average ratio. 


Minitab gives r = .967, though the hand calculated value may be slightly different because when there are 
ties among the x,)’s, Minitab uses the same y, for each x; in a group of tied values. cj = .9707, and cos = 
9639, so .05 < P-value < .10. At the 5% significance level, one would have to consider population 
normality plausible. 


Section 14.3 


25. 


27. 


29. 


The hypotheses are Hp: there is no association between extent of binge drinking and age group vs. 

H,; there is an association between extent of binge drinking and age group. With the aid of software, the 
calculated test statistic value is 7° = 212.907. With all expected counts well above 5, we can compare this 
value to a chi-squared distribution with df = (4 — 1)(3 — 1) = 6. The resulting P-value is ~ 0, and so we 
strongly reject Ho at any reasonable level (including .01). There is strong evidence of an association 
between age and binge drinking for college-age males. In particular, comparing the observed and expected 
counts shows that younger men tend to binge drink more than expected if Ho were true. 


With i= 1 identified with men and i = 2 identified with women, and j = 1, 2, 3 denoting the 3 categories 
L>R, L=R, L<R, we wish to test Ho: pi; = px for j = 1, 2, 3 vs. Hy: pry # Px for at least one j. The estimated 
cell counts for men are 17.95, 8.82, and 13.23 and for women are 39.05, 19.18, 28.77, resulting in a test 
statistic of x’ = 44.98. With (2 — 1)(3 — 1) = 2 degrees of freedom, the P-value is < .001, which strongly 
suggests that Hp should be rejected. 


a. The null hypothesis is Ho: p1y = po = py for j= 1, 2, 3, 4, where pj is the proportion of the ith 
population (natural scientists, social scientists, non-academics with graduate degrees) whose degree of 
spirituality falls into the jth category (very, moderate, slightly, not at all). 


From the accompanying Minitab output, the test statistic value is x? = 213.212 with df = (3-1)(4-1) = 


6, with an associated P-value of 0.000. Hence, we strongly reject Hy. These three populations are not 
homogeneous with respect to their degree of spirituality. 
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Chi-Square Test: Very, Moderate, Slightly, Not At All 


Expected counts are printed below observed counts 
Chi-Square contributions are printed below expected counts 


Very Moderate Slightly Not At All Total 


1 56 162 198 211-627 
78.60 195.25 183.16 170.00 
6.497 5.662 1.203 9.889 

“| 2 56 223 243 239-761 
95.39 236.98 222.30 206.33 
16.269 0.824 1.928 5.173 

$s ox ako 164 74 28 375 
47.01. 116.78 109.54 101.67 
81.752 19.098 11.533 53.384 

Total 221 549 515 478 1763 


Chi-Sq = 213.212, DF = 6, P-Value = 0.000 


b. We're now testing Hp: pi; = pz for j = 1, 2, 3, 4 under the same notation, The accompanying Minitab 
output shows x’ = 3.091 with df = (2-1)(4-1) = 3 and an associated P-value of 0.378. Since this is 
larger than any reasonable significance level, we fail to reject Hp. The data provides no statistically 
significant evidence that the populations of social and natural scientists differ with respect to degree of 
spirituality. 


Chi-Square Test: Very, Moderate, Slightly, Not At All 


Expected counts are printed below observed counts 
Chi-Square contributions are printed below expected counts 


Very Moderate Slightly Not At All Total 


1 56 162 198 211 627 
50.59 173.92 199.21 203.28 
0.578 0.816 0.007 0.293 

2 56 223 243 239 761 
61.41 211.08 241.79 246.72 
0.476 0.673 0.006 0.242 

Total 112 385 441 450 1388 


Chi-Sq = 3.091, DF = 3, P-Value = 0.378 
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Chapter 14: Goodness-of-Fit Tests and Categorical Data Analysis 


The accompanying table shows the proportions of male and female smokers in the sample who began 
smoking at the ages specified. (The male proportions were calculated by dividing the counts by the 
total of 96: for females, we divided by 93.) The patterns of the proportions seems to be different, 
suggesting there does exist an association between gender and age at first smoking. 


Gender 
Male Female 
<16 0.26 0.11 
Age 16-17 0.25 0.34 
18-20 0.29 0.18 
>20 0.20 0.37 


The hypotheses, in words, are Hp: gender and age at first smoking are independent, versus H,: gender 
and age at first smoking are associated. The accompanying Minitab output provides a test statistic 
value of x = 14,462 at df = (2-1)(4-1) = 3, with an associated P-value of 0.002. Hence, we would 
reject Ho at both the .0S and .01 levels. We have evidence to suggest an association between gender 
and age at first smoking. 


Chi-Square Test: Male, Female 


Expected counts are printed below observed counts 


Chi-Square contributions are printed below expected counts 
Male Female Total 
l 25 10 35 
17.78 17.22 
2.934 3.029 
2 24 32 56 
28.44 27.56 
0.694 0.717 
3 28 17 45 
22.86 22.14 
2 Wits BS 1.194 
4 19 34 53 
26.92 26.08 
2.330 2.406 
Total 96 93 189 
Chi-Sq = 14.462, DF = 3, P-Value = 0.002 
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‘ (v,-£,) N?-2E.N, +E: N? , ; 
3 =e yr—1 — tt ~ py - 25 EN, + DEE, , but LLL, = LEN, =n, $0 


ij ij 7 


: N; ! : 4 : ‘ ‘ 
7 ===—-n. This formula is computationally efficient because there is only one subtraction to be 
by 


performed, which can be done as the last step in the calculation. 


ny ny. 


n ‘ 
35, With p, denoting the common value of pj, Pi, Pj, ANd Pys UNder Ho, Py — and Ej, = , where 


a 4 
ny = om and n= yn . With four different tables (one for each region), there are 4(9 — 1) = 32 
k=l kel 


freely determined cell counts. Under Ho, the nine parameters p1), ..., P33 must be estimated, but ZZp, =1, 


so only 8 independent parameters are estimated, giving v° df = 32 — 8 = 24. Note: this is really a test of 
homogeneity for 4 strata, each with 3x3=9 categories. Hence, df = (4— 1)(9 — 1) = 24. 


Supplementary Exercises 


a7. There are 3 categories here — firstborn, middleborn, (2 or 3" born), and lastborn. With p;, p2, and p; 
denoting the category probabilities, we wish to test Ho: p; = .25, p2 = 50, p; = .25 because p2 = P(2™ or 3 
born) = .25 + .25 = .50. The expected counts are (31)(.25) = 7.75, (31)(.50) = 15.5, and 7.75, so 

v2 _(12-7.75)° (11-155) | (8~7.75) _5 65. Atdf=3— 1 =2, 3.65 < 5.992 => P-value > .05 => Hy is 
7.75 15.5 7.75 
not rejected. The hypothesis of equiprobable birth order appears plausible. 


a. For that top-left cell, the estimated expected count is (row total)(column total)/(grand total) = 
(189)(406)/(852) = 90.06. Next, the chi-squared contribution is (O — £)°/E = (83 — 90.06)7/90.06 = 
0.554. 


b. No: From the software output, the P-value is .023 > .01. Hence, we fail to reject the null hypothesis of 
“no association” at the .01 level. We have insufficient evidence to conclude that an association exists 
between cognitive state and drug status. [Note: We would arrive at a different conclusion for a = .05.] 


41. The null hypothesis Ho: py = p; p; states that level of parental use and level of student use are independent 
in the population of interest. The test is based on (3 — 1)(3 — 1) = 4 df. 


Estimated expected counts 


119.3 57.6 
82.8 33.9 
23.9 Ye) 
226 109 


The calculated test statistic value is ¢ = 22.4; at df = (3 — 1)(3 — 1) =4, the P-value is < .001, so Hy should 
be rejected at any reasonable significance level. Parental and student use level do not appear to be 
independent. 
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43. This is a test of homogeneity: Ho: py = Py = Py forj=1, 2,3, 4, 5. The given SPSS output reports the 
calculated x = 70.64156 and accompanying P-value (significance) of .0000. We reject Ho at any 
significance level. The data strongly supports that there are differences in perception of odors among the 


three areas. 
45. (n,—mpo) = ("Py -”, y =(n-n, -n(1- Pw)) =(n,—mpy) . Therefore 
_(n = mp) i —nPy) Le (", —AP\o y Fars 
NP MP» n, Pw Px 


9 


-(4- i eee arora 
= Pro TRE PS es 
n Pi P2 PyoP20/ 7 


47. 
a. Our hypotheses are Hp: no difference in proportion of concussions among the three groups Vv. H,: there 
is a difference in proportion of concussions among the three groups. 
No 
Concussion 


Concussion 


Observed 


Soccer 91 
Non Soccer 96 
Control 53 


Total 


Concussion Concussion 
30.7125 60.2875 
32.4 63.6 

17.8875 37.1125 


Soccer 
Non Soccer 


Control 

Total 

» _ (45-30.7125)" , (46—60.2875)" (28-324) _ (68-63.6) 
30.7125 60.2875 32.4 63.6 


8—17.8875)  (45-37.1125) Fiabe 

{GErrestey + (45=37-1125) = 19.1842. The df for this test 1s (I—1)(J— 1) = 2, so the P-value is 
17.8875 37.1125 

less than .001 and we reject Ho. There is a difference in the proportion of concussions based on 


whether a person plays soccer. 


b. The sample correlation of r = -.220 indicates a weak negative association between “soccer exposure” 
and immediate memory recall. We can formally test the hypotheses Ho: p = 0 vs Hy: p < 0. The test 


ln ~ts. = : 
statistic is t= => = = 22089 ~~2.13. At significance level a = -01, we would fail to reject Ho 


ji--? 1-2? 


and conclude that there is no significant evidence of negative association in the population. 
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c. We will test to see if the average score on a controlled word association test is the same for soccer and 
non-soccer athletes. Hp: 4) = #2 vs Hy: 4; #4. Since the two sample standard deviations are very 
close, we will use a pooled-variance two-sample f test. From Minitab, the test statistic is ¢= —0.91, with 
an associated P-value of 0.366 at 80 df. We clearly fail to reject Hy and conclude that there is no 
statistically significant difference in the average score on the test for the two groups of athletes. 


d. Our hypotheses for ANOVA are Hp; all means are equal vs H,: not all means are equal. The test 


statistic is f = nally 
MSE 
> > > 3.4659 
SSTr = 91(.30—.35)* +96(.49 —.35)* +53(.19-.35)° = 3.4659 MSTr = 5 = 1.73295 


‘ : 4. 
SSE = 90(.67)? +95(.87)? + 52(.48)? = 124.2873 and MSE = = = 5244. 


1.73295 
Rie emer 


significance level .05, we reject the null hypothesis. There is sufficient evidence to conclude that there 
is a difference in the average number of prior non-soccer concussions between the three groups. 


= 3.30. Using df= (2,200) from Table A.9, the P-value is between .01 and .05. At 


49. According to Benford’s law, the probability a lead digit equals x is given by logio(1 + 1/x) forx=1,..., 9. 
Let p; = the proportion of Fibonacci numbers whose lead digit is i (i = 1, ..., 9). We wish to perform a 
goodness-of-fit test Ho: p; = logio(1 + 1/i) fori=1,..., 9. (The alternative hypothesis is that Benford’s 
formula is incorrect for at least one category.) The table below summarizes the results of the test. 


Digit 1 2 3 4 5 6 7 8 9 
Obs.# 25 16 11 7 7 5 4 6 4 
Exp.# 25.59 1497 10.62 824 6.73 5.69 4.93 435 3.89 


Expected counts are calculated by mp; = 85 logio(1 + 1/i). Some of the expected counts are too small, so 
combine 6 and 7 into one category (obs = 9, exp = 10.62); do the same to 8 and 9 (obs = 10, exp = 8.24). 


(25 -25.59)’ gees 8.24) 
25.59 8.24 
there are 7 categories after the earlier combining). Software provides a P-value of .988! 


The resulting chi-squared statistic is ¢= = 0.92 at df= 7 — 1 = 6 (since 


We certainly do not reject Hy — the lead digits of the Fibonacci sequence are highly consistent with 
Benford’s law. 
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CHAPTER 15 


Section 15.1 


Refer to Table A.13. 
a. Withn = 12, Po(S, > 56) = .102. 


b. With = 12, 61 < 62 < 64 => Po(S. = 62) is between .046 and .026. 


c. With n = 12 anda lower-tailed test, P-value = Po(S, = n(n + 1)/2 - ss) = P(S, > 12(13)/2 — 20) = 
Po(S. > 58). Since 56 < 58 < 60, the P-value is between .055 and .102. 


d. With m = 14 and a two-tailed test, P-value = 2Po(S. 2 max {21, 14(15)/2 —21}) =2Po(S. 2 84) = 025. 


e. With n= 25 being “off the chart,” use the large-sample approximation: 
s,—n(n+1)/4 _ __300=25(26)/4 _ 3 7 =, two-tailed P-value = 2P(Z> 3.7) ~ 0. 


~ Tn(n+iyan+l/24  (25(26)(51)/24 


We test Ho: « = 7.39 vs. H,: « $7.39, so a two tailed test is appropriate. The (x; — 7.39)’s are —.37, —.04, 
—.05, —.22, —.11, .38, —.30, —.17, .06, —.44, .01, ~.29, 07, and —.25, from which the ranks of the three 
positive differences are 1, 4, and 13. Thus s, = 1 +4+13 =18, and the two-tailed P-value is given by 
2Po(S. > max {18, 14(15)/2 — 18}) = 2Po(S; 2 87), which is between 2(.025) and 2(.010) or .0S and .02. In 
particular, since P-value < .05, Ho is rejected at level .05. 


yA 


The data are paired, and we wish to test Ho: p = 0 vs. Hy: Up <= 9. 
d, eS ei ae eG Oe he 29 a 1S ‘Ss 23-09 25 
rank 1 107) 12%" 33? 6* 5 iE Y fos Yio 8* 4* 9* 
s.=10+12+.., +9=72, 90 the 2-tailed P-value is 2Po(S,. > max {72, 12(13)/2 — 72}) = 2Po(S- 2 72) < 
2(.005) = .01. Therefore, Hy is rejected at level .05. 


The data are paired, and we wish to test Ho: up = 20 vs. Hy: fp > .20 where fn = Loudoor ~ Hindoor- Because 
n= 33, we'll use the large-sample test. 


d;—.2 

0.63 0.43 23 
0.23 0.03 4 
0.96 0.76 31 

0.2 0 l 
~0.02 —0),22 18 
0.03 0.17 14 
0.87 0.67 30 

0.3 0.1 9.5 
0.31 0.11 11 
0.45 0.25 20 


—0.26 0.46 24 
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s,—n(n+1)/4 424 -—280.5 143.5 : 
From the table, s, = 424, so z= “ = = = 2.56. The upper-tailed 
Jnin+1\2n+1)/24 3132.25 55.9665 Pes 


P-value is P(Z > 2.56) = .0052 < .05, so we reject HO. There is statistically significant evidence that the true 
mean difference between outdoor and indoor concentrations exceeds .20 nanograms/m’. 


When Hy is true, each of the above 24 rank sequences is equally likely, which yields the distribution of D: 


d 0 2 4 6 8 1G a2 14 16 18 20 
p(d) | 1/24 3/24 1/24 4/24 2/24 2/24 2/24 4/24 1/24 3/24 1/24 


} Then c = 0 yields a = 1/24 = .042 (too small) while c = 2 implies a = 1/24 + 3/24 = .167, and this is the 
closest we can come to achieving a .10 significance level. 


Section 15.2 


. 11. The ordered combined sample is 163(y), 179(y), 213(y), 225(y), 229(x), 245(x), 247(y), 250(x), 286(x), and 
299(x), sow=5+6+8+4+9+10=38. With m=n=5, Table A.14 gives P-value = Po( W = 38), which is 
between .008 and .028. In particular, P-value < .05, so Hy is rejected in favor of H,. 


13. Identifying x with unpolluted region (m = 5) and y with polluted region (m = 7), we wish to test the 
hypotheses Ho: 4) — 42 = 0 vs. Hy: 4) — 2 < 0. The x ranks are 1, 5, 4, 6, 9, so w = 25. In this particular 
order, the test is lower-tailed, so P-value = Po W > 5(5 + 7 + 1) —25) = Po(W > 40) > .053. So, we fail to 
reject Hy at the .05 level: there is insufficient evidence to conclude that the true average fluoride level is 
higher in polluted areas. 


15. Let “; and 2 denote true average cotanine levels in unexposed and exposed infants, respectively. The 
hypotheses of interest are Hp: «; — #2 =—25 vs. Ho: My) — M2 <—25. Before ranking, —25 is subtracted from 
each x; (i.e. 25 is added to each), giving 33, 36, 37, 39, 45, 68, and 136. The corresponding x ranks in the 
combined set of 15 observations are 1, 3, 4, 5, 6, 8, and 12, from which w=1+3+... + 12 =39. With m= 
7 and n = 8, P-value = Po W> 7(7 + 8 + 1) — 39) = Po W= 73) = .027. Therefore, Hp is rejected at the .05 
level. The true average level for exposed infants appears to exceed that for unexposed infants by more than 
25 (note that Hy would not be rejected using level .01). 
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21, 


Section 15.3 


17. 


19, 


Chapter 15: Distribution-Free Procedures 


n= 8, so from Table A.15, a 95% CI (actually 94.5%) has the form (Hs..s.»¥s2))= (%.)»% 0) )- It is easily 
5.0+5.0_ $.0+118 9 ,, 5.0+12.2 


verified that the 5 smallest pairwise averages are 5.00, 3 8.40, ier a =8.60, 
5.0+17.0 { : : : : 
a ok: = 11,00, and a =11.15 (the smallest average not involving 5.0 is 


5 “ae =11,8), and the 5 largest averages are 30.6, 26.0, 24.7, 23.95, and 23.80, so the confidence 


(6) ~ >: 


interval is (11.15, 23.80). 


First, we must recognize this as a paired design; the eight differences (Method | minus Method 2) are 
0.33, 0.41, -0.71, 0.19, -0.52, 0.20, -0.65, and -0.14. With n = 8, Table A.15 gives c = 32, and a 95% Cl 


for Mp 1S (Xg¢gs1y/2-3241)>¥(32)) = His) >X32)) - 

Of the 36 pairwise averages created from these 8 differences, the 5™ smallest is X,,, = —0.585, and the 
5"-largest (aka the 32™-smallest) is ¥,4) = 0.025. Therefore, we are 94.5% confident the true mean 
difference in extracted creosote between the two solvents, /tp, lies in the interval (-.585, .025). 


m=n=5 and from Table A.16, c = 21 and the 90% (actually 90.5%) interval is (d,,,),dy.2)- The five 


smallest x; — y, differences are -18, -2, 3, 4, 16 while the five largest differences are 136, 123, 120, 107, 87 
(construct a table like Table 15.5), so the desired interval is (16, 87). 


Section 15.4 


23. 


25. 


Below we record in parentheses beside each observation the rank of that observation in the combined 
sample. 


1 5.8(3) 6.1(5) 6.4(6) 6.5(7) 7.7(10) r,=31 
2 7.1(9) 8.8(12) 9.9(14) 10.5(16) 11.2(17) 2, = 68 
"F 5.1(1) 5.7(2) 5.9(4) 6.6(8) 8.2(11) r3, = 26 
4 9.5(13) 1.0.3(15) —-11.7(18) 12.1(19) 12.4(20) rs = 85 


2 2 2 2 
12 ese 226 285) (2) 14.06. At 3 df, the P-value is < 


The computed value of k is k = 
20(21) > 


.005, so we reject Hp. 


The ranks are 1, 3, 4, 5, 6, 7, 8, 9, 12, 14 for the first sample; 11, 13, 15, 16, 17, 18 for the second; 2, 10, 
19, 20, 21, 22 for the third; so the rank totals are 69, 90, and 94. 


2 2 2 
k= pe Ma Oe —3(23)=9.23; at 2 df, the P-value is roughly .01. Therefore, we reject 
22(23)| 10 6 5 


H, : {4 = Hy = M, at the .05 level. 
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; 


27. 


Chapter 15: Distribution-Free Procedures 


1226 


12 


10(3)(4) 


The computed value of F, is (1226)—3(10)(4)=2.60 . At 2 df, P-value > .10, and so we don’t 


reject Hp at the .05 level. 


Supplementary Exercises 


29. 


31. 


33. 


35. 


Friedman’s test is appropriate here. It is easily verified that 5 =28, r, =29, 4 =16, 7, =17, from 
which the defining formula gives f, = 9.62 and the computing formula gives /; = 9.67. Either way, at 3 df 
the P-value is < .025, and so we reject H, :@, =@, =a, =@,= 0 at the .05 level. We conclude that there are 
effects due to different years. 


From Table A.16, m=n=5 implies that c = 22 for a confidence level of 95%, so 
mn—c+1=25—22=1=4. Thus the confidence interval extends from the 4" smallest difference to the 
4* largest difference. The 4 smallest differences are —7.1,-6.5, —6.1, -5.9, and the 4 largest are —3.8, -3.7, 
-3.4, 3.2, so the Cl is (—5.9, -3.8). 


a. With “success” as defined, then Y is binomial with n = 20. To determine the binomial proportion p, we 
realize that since 25 is the hypothesized median, 50% of the distribution should be above 25, thus 
when Hp is true p = .50. The upper-tailed P-value is P(Y > 15 when ¥ ~ Bin(20, .5)) = 1 — B(14; 20, .5) 
= .021. 


b. For the given data, y = (# of sample observations that exceed 25) = 12. Analogous to a, the P-value is 
then P(¥ > 12 when Y ~ Bin(20, .5)) = 1 — B(11; 20, .5) = .252. Since the P-value is large, we fail to 
reject Hy — we have insufficient evidence to conclude that the population median exceeds 25. 


Sample: iy Ps y y x x x y y 
Observations: 3.7 4.0 4.1 43 44 48 49 5.1 5.6 
Rank: 1 3 5 7 9 8 6 4 2 


The value of W’ for this data is w’ = 3+6+8+9=26. With m =4 and n=5, he upper-tailed P-value is 
Po(W = 26) > .056. Thus, Hy cannot be rejected at level .05. 
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CHAPTER 16 


Section 16.1 
1. All ten values of the quality statistic are between the two control limits, so no out-of-control signal is 
generated. 


3. P(10 successive points inside the limits) = P(1" inside) x P(2™ inside) x...x P(10 inside) = (.998)'° = 
9802. P(25 successive points inside the limits) = (.998)"° = .9512. (.998)" = .9011, but (.998)"* = 8993, 
so for 53 successive points the probability that at least one will fall outside the control limits when the 
process is in control is | - .8993 = .1007 > .10. 


USL-LSL _3.1-2.9 


Ee = 1.67. This is indeed a very good 
60 6(.02) 


a. For the case of 4(a), with a= .02, C,= 


3.1-2.9_ 
6(.05) 


capability index. In contrast, the case of 4(b) with a = .05 has a capability index of C, = 


0.67. This is quite a bit less than 1, the dividing line for “marginal capability.” 


USL=-y _ 3.1-3.04_ Te H-LSL _ 3.04—2.9 _ 
30 3(.02) 30 3(.02) 


b. For the case of 4(a), with « = 3.04 and o = .02, 
2.33, so C,, = min{1, 2.33} = 1. 
: USL-—y  3.1-3.00 u-LSL  3.00-—2.9 
For the case of 4(b), with u = 3.00 and o = .05, ———— = ————= .67 and —_——- = —_____ = 
pirtigintsi | hi 30—«=~«K.OS#) 30 3.05) 
.67, 80 C,x = min{.67, .67} = .67. Even using this mean-adjusted capability index, process (a) is more 
“capable” than process (b), though C,, for process (a) is now right at the “marginal capability” 


threshold. 
c. In general, C,, < C,, and they are equal iff u = ae i.e. the process mean is the midpoint of the 

Se : USL 

spec limits. To demonstrate this, suppose first that “ = ae Then 

USL—y _ USL- (LSL+USL)/2 _ 2USL-—(LSL+USL) _ USL-LSL _ C,, and similarly 

30 30 60 60 
-LSL 

— =C,. In that case, Cpy = min{ Cp» Cp} = Cp. 

Otherwise, suppose is closer to the lower spec limit than to the upper spec limit (but between the 

two), so that « — LSL < USL — u. In such a case, C,, = gets . However, in this same case 4 < 

LSL+ USL ~ fiom which H—LSL _ (LSL+USL)/2—LSL _ USL-LSL _ C,. That is, Ga <C,, 
2 30 30 60 


Analogous arguments for all other possible values of also yield C,. < C;: 
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Section 16.2 


1. 


13. 


a. P(point falls outside the limits when y= 4, +.50) =1- P| u = ae <i; + when H=My +80 
n n 


=1-P(-3-.Svn<Z<3-SVn) =1-P(-4.12<Z <1.882)=1-.9699 =.0301. 


b. 1 P( ay —3 << mt when w= 1-0 =1-P(-3+Vn<Z<3+Vn) 
n n 


7 


=1-P(-.76<Z<5.24)=.2236 


o 
. 


1— P(-3-2vn <Z<3-2Vn) =1-P(-7.47<Z <-1.47) =.9292 


¥=12.95 and ¥ =.526, so with a, =.940, the control limits are 
526 


940J5 


evidence of an out-of-control process. 


12.95+3 =12.95+.75 =12.20,13.70. Again, every point () is between these limits, so there is no 


r= a = = 96.54 , ¥=1.264, and a, =.952 , giving the control limits 
1.264 ak nd / 
96.54+3 9526 = 96.54+1.63 = 94,91,98.17. The value of ¥ on the 22™° day lies above the UCL, so the 


process appears to be out of control at that time. 


28lao <s 2.8lo 


a. Pu he <X<u,+ 7 


point falls outside the limits is .005 and ARL= = =200. 


when “= 1) = P(-2.81 <Z<2.81) =.995, so the probability that a 


b. P(a point is inside the limits) = P| - — <X<f,+ —— when 4 = M, +o] =...= 
n n 


P(-2.81-Vn<Z<281-Vn)= P48! <Z< 81) [when n = 4] ~ ©(.81) =.7910 => 


p = P(a point is outside the limits) = 1 — .7910 = .209 = ARL = sa = 4.78. 


c. Replace 2.81 with 3 above. For a, P(-3 < Z <3) =.9974, so p= 1 — .9974 = .0026 and 
ARL = = = 385 for an in-control process. When y= 44, +o as in b, the probability of an out-of- 
control point is 1 - P(-3 -Vn < Z<3—Vn )=1-P(-5 <Z< 1)= 1— (1) =.1587, so 
ARL = a5 = 6.30. 
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15. ¥ =12.95, 1QR =.4273, k, =.990. The control limits are 12.953 ee 18 713.53 
990V5 
Section 16.3 
17. 
1 O52 
a.) f= aie = 2.84 , by = 2.058, and cq = .880. Since n= 4, LCL = 0 and UCL 
= 2.94 4 2¢880)2.84) = 2,84+3.64 =6.48. 
2.058 


b. 7=3.54, b, = 2.844, and c, = 820. and the control limits are 


3(.820)(3.54 
354424820854) = 3.544 3.06 =.48,6.60. 
2.844 
19. ¥ =1.2642, a, =.952, and the control limits are 
(952) 
1.2642 + ae = 1.2642 +1.2194 = .045,2.484. The smallest s; is S29 = -75, and the largest 
iS $\7 = 1.65, so every value is between .045 and 2.434. The process appears to be in control with respect to 
variability. 
Section 16.4 
; 578 peda EE) 
21. atl where Yh ot. ithe sh be 58, ils Pers 23. 
. k ate id “4 n n 100 highs 


.231)(.769 
a. Thecontrol limits are .231+3 Be) = 231+.126 =.105,.357. 


b. a =.130. which is between the limits, but = = .390 , which exceeds the upper control limit and 


therefore generates an out-of-control signal. 


23. LCL>0 when p>3 P(\=P) . i.e. (after squaring both sides) 50p° > 9p(1-p), i.e. S0p > 3(1-p), ie. 


n 


u 3 
5S3p>3=> p=—-=.0566. 
P53 


25. Ex, =102 , ¥ = 4.08, and xt3Vx = 4.08+ 6.06 = (—2.0,10.1). Thus LCL = 0 and UCL = 10.1. Because 


no x; exceeds 10.1, the process is judged to be in control. 
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27. 


With u, =24, the u, ’sare 3.75, 3.33, 3.75, 2.50, 5. 


i 


6.00, 12.00, 3.75, 5.00, 8.33, and 1.67 fori=|!, 


Chapter 16: Quality Control Methods 


00, 5.00, 12.50, 12.00, 6.67, 3.33, 1.67, 3.75, 6.25, 4.00, 


_... 20, giving 7 =5.5125. For g,=.6, 


723 [H =512529003, LCL 0, UCL = 14.6. For g, =.8, g13 (2 =531088 7.857,, LCL =0, 
&; 


UCL = 13.4. For g,=1.0, #43 


the corresponding UCL’s but none exceed them, 


Section 16.5 


29. 


31. 


= = §.5125+7.0436 , LCL = 9, UCL = 12.6. Several u;’s are close to 


so the process is judged to be in control. 


My =16, k =e 0.05, h=.20, d, =max(0,d,,+(%, ~16.05)) , ¢, =max(0,¢,, +(X, -15.95)). 
i ¥, —16.05 d, ¥, -15.95 ej 
] 0.058 0 0.024 0 
2 0.001 0.001 0.101 0 
3 0.016 0.017 0.116 0 
4 0.138 0 —0.038 0.038 
5 0.020 0 0.080 0 
6 0.010 0.010 0.110 0 
7 0.068 0 0.032 0 
8 -0.151 0 0.054 0.054 
9 0.012 0 0.088 0 
10 0.024 0.024 0.124 0 
Il 0.021 0.003 0.079 0 
12 —0.115 0 0.015 0.015 
13 ~0.018 0 0.082 0 
14 —0.090 0 0.010 0 
15 0.005 0.005 0.105 0 


For no time r is it the case that d, > .20 or that e, > .20, so no out-of-control signals are generated. 


Connecting 600 on the in-control ARL scale to 4 on the out-of-control scale and extending to the x’ scale 


.002 
———-= from which Jn =2A7S > n=4.73=S. Then connecting .87 
005 / Vn 

on the &’ scale to 600 on the out-of-control ARL scale and extending to h’ gives h’ = 2.8, so 


.005 


h= (Fes -( 97 \(28) = .00626. 


gives k’ = .87. Thus k’= Ale = 
a/ Jn 
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Section 16.6 


3 P : ; 
= For the binomial calculation, n = 50 and we wish 


: 50) , (50 e 6SGN 
P(X <2)= p' (i -p)'+ p' (1 _p) + p (l-p “© When p = 01, .02, ..., .10. For the 
0 ete : 


ie ae? if ee ia er M 
9 
hypergeometric calculation, P(X s 2)= oA + Lh 
ips les bd 


.., 50. The resulting probabilities appear below. 


, to be 


calculated for M=5, 10, 15,. 


.03 04 05 .06 .07 08  .09 10 


9919 9317 8182 .6775 


5343 4047 ~=««.2964—s 2110 1464 ~—-.0994 


9862 .9216 8108  .6767 5405 4162 3108 = -.2260 1605 NN 


100 oo (100 » (100) , om 
35. p(x <2)=| Vora)" +| ' |p'(1-P) | a \e*(-2) 
\ — 
5 


9206 6767 4198 2321 1183 0566 .0258 0 13.0048 0019 


For values of p quite close to 0, the probability of lot acceptance using this plan is larger than that for the 
previous plan, whereas for larger p this plan is less likely to result in an “accept the lot” decision (the 
dividing point between “close to zero” and “larger p” is someplace between 01 and .02). In this sense, the 
current plan is better. 


37. P(accepting the lot) = P(X, =0 or 1) + P(X, =2, X= 9, 1, 2, or 3) + P(X, =3, X2 = 0, 1, or 2) 
= P(X, = 0 or 1) + P(% = 2)P(X2 = 0, 1, 2, 08 3)+P(% = 3)P(X, = 0, 1, or 2). 
p= .01: =.9106 + (.0756)(.9984) +-(.0122)(.9862) =.9981 


p= 05: =.2794+(.261 1)(.7604) +(.2199)(-5405) = 5968 
p= 10: =.0338 +(.0779)(.2503) + (.1386)(-1117) =.0688 


ii 


a. AOQ=pP(4)=pi(I-p) +50p(!- p)” +1225p*(1- p) | 


.05 06 .07 08 09 10 


010 .018 024 027 .027 . .025 022 


b. p= .0447, AOQL = .0447P(A) = .0274 
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ce. ATI=S50P(A)* 2000(1 — P(A)) 


p 03 . HA) 8 5 08 LOE ar sae 


ATI 773 202.1 418.6 679.9 945.1 1188.8 1393.6 15593 1686.1 1781.6 


Supplementary Exercises 


41. n= 6, k= 26, Ex, =10,980, ¥ = 422.31, Ls, =402, ¥=15.4615, Lr, =1074, F =41.3077 


3(15.4615),/1 —(.952), 
3(15.4615) YET VPP) _ 15.4615414.9141 = .55,30.37 


952 
848 31 
R chart: gap OEMS D = 41 312.41.44,, 0 LCL = 0, UCL = 82.75 


2.536 


S chart: 15.4615+ 


3(15.4615) _ so 49 442.20 
V6 


X chart based on 5: 422.31+ 
952 


os 3(41.3077 
¥ chart based on Fr: 422.31 3(41.3077) = 402.36,442.26 


2.5366 
43. 

i x, Sj r; 
i 50.83 1.172 22 
2 50.10 854 1.7 
3 50.30 1.136 2.1 
4 50.23 1.097 2.1 
5 50.33 666 13 
6 51.20 854 1.7 
7 50.17 416 8 
s 50.70 964 1.8 
9 49.93 1.159 2.1 
10 49,97 473 

11 50.13 698 
12 49.33 833 1.6 
13 50.23 839 1.5 
14 50.33 404 8 
15 49.30 265 5 
16 49.90 854 1.7 
17 50.40 781 14 
18 49.37 902 1.8 
19 49.87 643 1.2 
20 50.00 794 15 
21 50.80 2.931 5.6 
22 50.43 971 1.9 
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~ 


—> 


, =19.706 , 5 =.8957, Xx, =1103.85, = = 50.175, a, =.886, from which an s chart has LCL = 0 and 


uy = 3(.8957 Vi —(.886) | 

CL = 8957+ sternite = 2.3020, and s,,=2.931>UCL. Since an assignable cause 1s 
@ssumed to have been identified we eliminate the 2 1” 
The resulting UCL for an s chart is 2.0529, 


3(.7988) ae 
ern :4 ~ 4858.51.71. All x, values are between these limits. 


group. Then Zs, =16.775, 5 = 7998, x =50.14 


and s, < 2.0529 for every remaining /. The x chart based on 5 


wn 


thas limits 50.145+ 


En, = 4(16)+(3)(4)=76, nx, =32,729.4, F = 430.65, 
a — 


Ele—1)\s* 2 
E(n,—1)s} _ 27,380.16—-5661.4 _ 509 9279 , so s = 24.2905. For variation; when n = 3, 
X(n,-1) 76-20 


3(24.2905),/1-(.886) 
UCL = 24.2905 + 3(24.2908) yi CSS — 24,29 +38.14= 62.43 ; whenn= 4, 


3(24.2905 d= 921) 
eee RE CSc basta 4.29430 


24.29 + 30.82 =55.1 1. For location: when n = 3, 
921 


430.65 + 47.49 = 383.16,478.14 ; when n = 4, 430.65+39.56 = 391.09,470.21 . 
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